Readme PDF

Identification of top 100 trending songs
in Saavn’s stream data

Contents
1. Problem statement: .............................................................................................................................. 1
2. Algorithm of the solution: ..................................................................................................................... 2
2.1 Filtering of spikes in the data: ............................................................................................................. 2
3. MapReduce design:............................................................................................................................... 2
3.1 The mapper: ........................................................................................................................................ 2
3.2 The combiner: ..................................................................................................................................... 3
3.3 The partitioner: ................................................................................................................................... 3
3.4 The reducer: ........................................................................................................................................ 3
4. Running the program: ........................................................................................................................... 3
5. Analaysing the output: .......................................................................................................................... 4
6. Conclusions: ........................................................................................................................................ 11
1. Problem statement:
Write and execute a MapReduce program to figure out the top 100 trending songs from
Saavn’s stream data, on a daily basis, for the week December 25-31, year 2017. A stream is a
record of a user playing a song.. Each stream is represented as a tuple with the following
attributes:
(song ID, user ID, timestamp, hour, date)

2. Algorithm of the solution:
A configurable window (parameter “window” in the command line) is defined which specifies the
number of days prior to the target date the songs are sourced from. A weight is assigned to each song
depending on how far back from the target date, in terms of the number of days, a song was streamed.
The weight decreases as the song is streamed further back from the target date. The weight is
calculated as follows.
Weight = 1*(0.8)^(difference in date - 1)
For example if song was streamed on the date preceding the target date, the difference in date
is 1 and therefore the weight is the full 1.0. If streamed two days preceding the target date the weight
will be 0.8. Thus in this scheme the further back the song is streamed from the target date, the less
importance it is given to song while accounting it in the total number of times the song is played.
The weights are then summed to find the total weight for each song. The songs are sorted
according to the sum and the top 100 songs in terms of the highest sum is chosen as the top 100
trending songs.
2.1 Filtering of spikes in the data:

A song stream in an half hour period is defined as a spike if the half hour average, in terms of
the rate of the song being played (number of times played divided by 0.5 hours) exceeds or is equal to
spikefactor (obtained from user provided CLI option <–D spikefactor=> with a default of 1000) times the
hourly average for the whole day. The counts for that half hour period are simply normalized to the
average rate of the song being streamed that day.
3. MapReduce design:
The entire task is achieved with a single mapreduce phase and consists of a mapper, combiner,
partitioner and a reducer.
3.1 The mapper:

The mapper parses the field to get the songID, date and the timestamp. The fields are parsed.
For each target date the configurable window of previous days songs are recorded in the map. The
weight is also calculated for the song as defined in the algorithm section. A custom object ValuesPair is
defined which stores the weight and the timestamp of the song. The key for the mapper is a
combination of the target date, song ID and the timestamp and is a Text where each of the above
mentioned three fields is separated by a :
Mapper Key : targetDate:songID:timestamp.
The value is the ValuesPair custom object containing the weight and the timestamp.
Mapper value: ValuesPair
Note that the key needs the timestamp as this timestamp is the basis by which the spike filtering
algorithm works. Since a song with a timestamp can be spread across different mappers, this filtering
cannot be done at the combiners and we would have to wait till the reduce phase to process these
timestamps to perform the filtering.
3.2 The combiner:

In order to reduce the key value pairs and the associated overhead, a combiner is used that
sums the weights of a key and sends one key – value pair with that key and the summed weight and the
timestamp as the value, encapsulated in a ValuesPair object. The combiner output key which would be
an input key for the reducer is just
Combiner output key: targetDate:SongID
3.3 The partitioner:

The number of reducers is set as 7, one for each of the target dates of interest. The partitioner
parses the key to get the target date and send the song to the appropriate reducer based on the target
date. 25 goes to reducer 0, 26 to 1 and so on till 31 to reducer 6.
3.4 The reducer:

The reducer first processes the timestamp and performs spike filtering (based on the CLI
provided/default value of spike factor) as described above in the algorithm section. The reducer can do
this because the partitioner ensures that all songs with a given target date goes to the corresponding
partitioner. The values for a given key in the reducer are the weights and timestamps of the song
corresponding to the key. The weights in these values are summed and corrected for spikes. The songs
are then sorted using Javas PriorityQueue based on the corrected weights maintaining a queue length of
100. The priority queue will then contain the top 100 trending songs based on the weights so defined.
4. Running the program:

The saavnproject eclipse maven project is built using
Run As -> maven build
Start the EC2 instance.

The generated saavnproject-0.0.1-SNAPSHOT.jar file in the target folder is then transferred to EC2
machine using winscp. The DNS name is obtained from the EC2 dashboard, private key generated
earlier is used for authentication.
putty is used to obtain a login prompt to the EC2 instance as ec2-user.
S3Connector and AWS CLI is installed as mentioned in the project resources module.
From the directory where saavnproject-0.0.1-SNAPSHOT.jar is present the following command is

issued to run the mapreduce program
hadoop jar saavnproject-0.0.1-SNAPSHOT.jar saavnproject.SaavnTrendsDriver -D window=2 -D

spikefactor=1000 s3a://mapreduce-project-bde/part-00000 s3a://saavn-sitaraman-output/output-
proj-window2-decaypoint8-spike1000
where, window is the desired window size in the sliding window algorithm, spikefactor is the spike
factor as described in the algorithm section.
s3a://saavn-sitaraman-output/output-proj-window2-decaypoint8-spike10 is the output folder

whose naming convention gives the window size, the decay factor (please see algorithm section for
description of decay, chosen at a constant of 0.8 for this project submission) and the spikefactor used.
In the above example, window size is 2, decay factor is 0.8 and spike factor is 10.
The results are then copied to local directory using thecommand
hadoop fs -copyToLocal s3a://saavn-sitaraman-output//output-proj-window2-decaypoint8-

spike1000/* ./
The mapreduce generated output files are then downloaded to the PC using winscp using same
credentials as mentioned before.
The output files each corresponds to a date starting (part-r-00000) at 25-12-2017 and
ending(part-r-00006), in sequence, at 31-12-2017. The files are renamed accordingly to indicate the
date and placed in an output folder with the following type of name
output-window1-decaypoint8-spikefactor1000-dates
Similarly the gold standard set of files for the month of Dec-2017 provided in s3 is transferred to
EC2 using copyToLocal and then downloaded to the PC using winscp.
5. Analaysing the output:

The Java project to analyse the output is submitted as DataAnalysis-finalsubmission.zip. The main
class file in this project is AnalyseOuputs.java in package com.upgrad.bde.mr.analysis.
This Java program is written such that places each date of interest’s songs in a hash map keyed by
the date and containing a value a java HashSet of top 100 songIDs obtained via map reduce. A similar
map is constructed for the gold standard files provided. This program is in the project named
DataAnalysis
In the project the gold standard file is placed in the resources/goldstandard folder while the various
sets of map reduce output files are placed (one folder per configuration tested with) in the
resources/output folder.
Finally, for each date the intersection set is collected using retainAll().
AnalyseOutputs class is updated to contain a list of all the directories in the output folder.
The AnalyseOutputs class is then run from eclipse using Run As -> Java application.
The output of the program is, for each directory, the number of songIds which overlapped with the gold
standard for each date of interest (25 to 31 Dec 2017) and the list of overlapping songs. This is output in
the file results.txt in the folder resources/analysisresult/ in the DataAnalysis project folder.
The output for the configurations tested is appended below:
Results for configuration: resources/output/output-window2-decaypoint8-

spikefactor1000/
-------------------------------------------------------------------------------------
---
For date 2017-12-25 Intersection size with gold standard set is 65 Common songs are
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, MVpoxopZ, PtccHl4F, uksx0FeR, hhOmGIls,
UG7x7CeZ, V3KN74T9, y3aIN6Vj, 3B7bhdh3, LvJNKAbb, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-
ZZmb-, GoPTQ8dT, X4q99UL7, 9xnr1G5u, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l,
4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV,
cWjC2ttw, a_pdu29o, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr,
8AjyAF5f, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs,
4Qq9sGzS, IUq0Mmjo, YJ6Vy4Nv, 0-5hUtBO, RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl,
vM6SpcAX, 8LuebslM, wpe4NbSF, hAPNuHZP, Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF,
BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, MVpoxopZ, uksx0FeR, hhOmGIls, UG7x7CeZ,
L3nJE_np, V3KN74T9, y3aIN6Vj, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-,
GoPTQ8dT, X4q99UL7, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-
x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o,
J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, nAI7nw4v, Q5dE-6Cm,
2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, YJ6Vy4Nv, 0-
5hUtBO, RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF,
hAPNuHZP, iMA-XcSh, Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, MVpoxopZ, uksx0FeR, hhOmGIls,
BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, uiEWT3kP, wlyGmMNW, Tiouijwe, gD-ZZmb-,
yoLx6Opy, mrbkif2c, GoPTQ8dT, X4q99UL7, AFOMpCP1, 9xnr1G5u, q-TCdiS-, Ra9F5rTD,
JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, o1lG-yUY, f-x9bxwm, vXaSvEK6,
t4_PwAYp, tgmmUOx0, N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o,
p7E2RtgF, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, 3hTPwXPr, TFcwEw7y, 8eo2Gj0-, Q5dE-
6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, wehVTVhD, 4Qq9sGzS, IUq0Mmjo, E_h53ExD, ryviUJgy,
1uVZjWMu, tD4RoIPl, vM6SpcAX, injzG32Z, wpe4NbSF, hAPNuHZP, Xo1Z1OwW, L_eY69Bv,
Wfngq_NC, BKaUtdGp]
BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, yshGBwsU, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-
ZZmb-, yoLx6Opy, GoPTQ8dT, X4q99UL7, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS,
icJam_5l, 4RmwbCw4, o1lG-yUY, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC,
eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o, sEdO8hht, Grohf5-o, J54I24Qs, gMCUOdzs,
fMag1Tga, uWbnhqa5, 3hTPwXPr, nAI7nw4v, 8AjyAF5f, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB,
Z2rHUQrq, 2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo,
7H0BeDsK, RCqwzFLD, 1uVZjWMu, tD4RoIPl, vM6SpcAX, wpe4NbSF, rhdnRJOM, hAPNuHZP,
Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp, rlpoMDqy]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, iKLVQyXn, MVpoxopZ, uksx0FeR,
hhOmGIls, BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-
ZZmb-, yoLx6Opy, uf3FMxIG, mrbkif2c, GoPTQ8dT, X4q99UL7, AFOMpCP1, Ra9F5rTD,
JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp,
tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV, a_pdu29o, sEdO8hht, Grohf5-o, J54I24Qs,
gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, nAI7nw4v, Q5dE-6Cm, 2oWa2CKB, rCOEcopo,
2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, fMhpYBOO, 0-
5hUtBO, 7H0BeDsK, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, PqoFjtzu, 8LuebslM,
wpe4NbSF, L_eY69Bv, Wfngq_NC, I8E7jszF, BKaUtdGp, rlpoMDqy]
hhOmGIls, BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-
ZZmb-, yoLx6Opy, uf3FMxIG, mrbkif2c, AHILcvSG, GoPTQ8dT, X4q99UL7, AFOMpCP1,
9xnr1G5u, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-
x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV, a_pdu29o, sEdO8hht, Grohf5-
o, J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, Q5dE-6Cm, 2oWa2CKB,
2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, fMhpYBOO,
7H0BeDsK, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF, 67d1h4XH,
L_eY69Bv, Wfngq_NC, I8E7jszF, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, MVpoxopZ, hhOmGIls, BSgKaQmC,
V3KN74T9, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-, yoLx6Opy, mrbkif2c,
GoPTQ8dT, X4q99UL7, wC1tukWQ, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l,
4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV,
a_pdu29o, p7E2RtgF, sEdO8hht, J54I24Qs, MkYz16Rk, fMag1Tga, uWbnhqa5, 3hTPwXPr,
TFcwEw7y, Q5dE-6Cm, ha4vQWj7, 2oWa2CKB, Z2rHUQrq, 2zXpXjMp, jLw_tYRU, YK47CRXj,
j8hvJDPs, IUq0Mmjo, yqdujZ0b, 7H0BeDsK, RCqwzFLD, 1uVZjWMu, tD4RoIPl, vM6SpcAX,
8LuebslM, wpe4NbSF, rhdnRJOM, hAPNuHZP, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
***********************************************************
For date 2017-12-25 Intersection size with gold standard set is 65
Results for configuration: resources/output/output-window2-decaypoint8-spikefactor10/

-------------------------------------------------------------------------------------
---
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, MVpoxopZ, uksx0FeR, hhOmGIls, UG7x7CeZ,
V3KN74T9, 3B7bhdh3, LvJNKAbb, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-, GoPTQ8dT,
X4q99UL7, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm,
t4_PwAYp, tgmmUOx0, N1DTyWDC, A6v2mnRl, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o,
p7E2RtgF, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, 8AjyAF5f,
fBMSljcP, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs,
4Qq9sGzS, IUq0Mmjo, YJ6Vy4Nv, 0-5hUtBO, RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl,
BKaUtdGp]
UG7x7CeZ, V3KN74T9, y3aIN6Vj, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-,
GoPTQ8dT, X4q99UL7, 9xnr1G5u, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4,
vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV, cWjC2ttw,
a_pdu29o, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, 8AjyAF5f,
8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs, 4Qq9sGzS,
IUq0Mmjo, YJ6Vy4Nv, 0-5hUtBO, RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX,
8LuebslM, wpe4NbSF, hAPNuHZP, Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, wlyGmMNW, Tiouijwe, gD-ZZmb-, yoLx6Opy,
mrbkif2c, GoPTQ8dT, X4q99UL7, AFOMpCP1, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS,
icJam_5l, 4RmwbCw4, o1lG-yUY, f-x9bxwm, vXaSvEK6, t4_PwAYp, tgmmUOx0, N1DTyWDC,
m87V1xS7, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o, p7E2RtgF, J54I24Qs, IlbqfHKT,
gMCUOdzs, fMag1Tga, 3hTPwXPr, fBMSljcP, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp,
jLw_tYRU, wehVTVhD, 4Qq9sGzS, IUq0Mmjo, E_h53ExD, fMhpYBOO, ryviUJgy, 1uVZjWMu,
tD4RoIPl, vM6SpcAX, injzG32Z, wpe4NbSF, hAPNuHZP, Xo1Z1OwW, L_eY69Bv, Wfngq_NC,
rlpoMDqy, BKaUtdGp]
BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, esxf9LJc, yshGBwsU, 1xqHQw3J, wlyGmMNW,
Tiouijwe, gD-ZZmb-, yoLx6Opy, GoPTQ8dT, X4q99UL7, Ra9F5rTD, JQNo0TMz, xoZWy8yV,
lSTmfWDS, icJam_5l, 4RmwbCw4, o1lG-yUY, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0,
N1DTyWDC, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o, sEdO8hht, Grohf5-o, J54I24Qs,
gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, 8AjyAF5f, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB,
RCqwzFLD, 1uVZjWMu, tD4RoIPl, vM6SpcAX, wpe4NbSF, rhdnRJOM, hAPNuHZP, Xo1Z1OwW,
Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp, rlpoMDqy]
x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV, a_pdu29o, sEdO8hht, Grohf5-
o, J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, Q5dE-6Cm, 2oWa2CKB,
rCOEcopo, 2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo,
fMhpYBOO, 7H0BeDsK, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, PqoFjtzu, 8LuebslM,
wpe4NbSF, 67d1h4XH, L_eY69Bv, Wfngq_NC, I8E7jszF, BKaUtdGp]
BSgKaQmC, V3KN74T9, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-, yoLx6Opy,
uf3FMxIG, mrbkif2c, AHILcvSG, GoPTQ8dT, X4q99UL7, AFOMpCP1, 9xnr1G5u, Ra9F5rTD,
JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp,
tgmmUOx0, N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV, a_pdu29o, sEdO8hht, Grohf5-o,
J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, Q5dE-6Cm, 2oWa2CKB,
rCOEcopo, 2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo,
fMhpYBOO, 7H0BeDsK, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF,
67d1h4XH, L_eY69Bv, Wfngq_NC, I8E7jszF, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, MVpoxopZ, hhOmGIls, BSgKaQmC,
V3KN74T9, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-, yoLx6Opy, mrbkif2c,
AHILcvSG, GoPTQ8dT, X4q99UL7, 9xnr1G5u, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS,
icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe,
QUNHFVVV, a_pdu29o, p7E2RtgF, sEdO8hht, J54I24Qs, fMag1Tga, uWbnhqa5, 3hTPwXPr,
nAI7nw4v, TFcwEw7y, fBMSljcP, Q5dE-6Cm, 2oWa2CKB, Z2rHUQrq, 2zXpXjMp, jLw_tYRU,
YK47CRXj, j8hvJDPs, IUq0Mmjo, yqdujZ0b, 7H0BeDsK, RCqwzFLD, 1uVZjWMu, tD4RoIPl,
vM6SpcAX, 8LuebslM, wpe4NbSF, hAPNuHZP, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
***********************************************************

spikefactor1000/
-------------------------------------------------------------------------------------
---
[1EvZOLZA, SxdL_VN6, rnJh62XO, og3XtU1p, vOTeHcnB, MVpoxopZ, uksx0FeR, hhOmGIls,
GoPTQ8dT, X4q99UL7, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, f-x9bxwm,
t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, Wa2ECpqQ, QUNHFVVV, cWjC2ttw, a_pdu29o,
J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, 3hTPwXPr, nAI7nw4v, fBMSljcP, 8eo2Gj0-, Q5dE-
6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, fMhpYBOO,
YJ6Vy4Nv, RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF,
hAPNuHZP, Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, MVpoxopZ, PtccHl4F, uksx0FeR, FcM89Jj5,
hhOmGIls, UG7x7CeZ, V3KN74T9, y3aIN6Vj, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe,
_qkI89L7, gD-ZZmb-, GoPTQ8dT, X4q99UL7, JQNo0TMz, xoZWy8yV, icJam_5l, 4RmwbCw4,
a_pdu29o, p7E2RtgF, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, Q5dE-
6Cm, 2oWa2CKB, dHLgfRu-, 2zXpXjMp, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, fMhpYBOO, YJ6Vy4Nv,
RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF, hAPNuHZP,
Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, Ni6noMmw, MVpoxopZ, uksx0FeR,
hhOmGIls, BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, LvJNKAbb, uiEWT3kP, wlyGmMNW,
Tiouijwe, gD-ZZmb-, yoLx6Opy, mrbkif2c, GoPTQ8dT, X4q99UL7, AFOMpCP1, Ra9F5rTD,
JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, o1lG-yUY, f-x9bxwm, vXaSvEK6,
t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o, 3x-IwBwd,
J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, 3hTPwXPr, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB,
2zXpXjMp, 4Qq9sGzS, IUq0Mmjo, E_h53ExD, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX,
injzG32Z, wpe4NbSF, hAPNuHZP, Xo1Z1OwW, L_eY69Bv, Wfngq_NC]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, iKLVQyXn, MVpoxopZ, PtccHl4F, uksx0FeR,
hhOmGIls, BSgKaQmC, V3KN74T9, 3B7bhdh3, esxf9LJc, 1xqHQw3J, wlyGmMNW, Tiouijwe,
_qkI89L7, gD-ZZmb-, yoLx6Opy, GoPTQ8dT, X4q99UL7, JQNo0TMz, xoZWy8yV, lSTmfWDS,
icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe,
QUNHFVVV, cWjC2ttw, a_pdu29o, sEdO8hht, J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga,
uWbnhqa5, 3hTPwXPr, Q5dE-6Cm, 2oWa2CKB, rCOEcopo, 2zXpXjMp, jLw_tYRU, wehVTVhD,
j8hvJDPs, IUq0Mmjo, 1uVZjWMu, tD4RoIPl, vM6SpcAX, vp9nFQQ2, wpe4NbSF, rhdnRJOM,
hAPNuHZP, Xo1Z1OwW, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp, rlpoMDqy]
5dKUt0bi, hhOmGIls, BSgKaQmC, V3KN74T9, 3B7bhdh3, yshGBwsU, 1xqHQw3J, wlyGmMNW,
Tiouijwe, _qkI89L7, gD-ZZmb-, uf3FMxIG, GoPTQ8dT, X4q99UL7, AFOMpCP1, JQNo0TMz,
xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0,
N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV, a_pdu29o, p7E2RtgF, sEdO8hht, Grohf5-o,
J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, Q5dE-6Cm, 2oWa2CKB,
2zXpXjMp, wehVTVhD, YK47CRXj, j8hvJDPs, IUq0Mmjo, 7H0BeDsK, ryviUJgy, 1uVZjWMu,
tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF, 67d1h4XH, L_eY69Bv, Wfngq_NC, I8E7jszF,
BKaUtdGp]
V3KN74T9, 3B7bhdh3, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-, uf3FMxIG, GoPTQ8dT,
X4q99UL7, AFOMpCP1, 9xnr1G5u, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4,
vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV,
a_pdu29o, sEdO8hht, Grohf5-o, J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga, 3hTPwXPr, Q5dE-
6Cm, 2oWa2CKB, rCOEcopo, 2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS,
IUq0Mmjo, fMhpYBOO, OprlcwYK, 0-5hUtBO, 7H0BeDsK, ryviUJgy, 1uVZjWMu, tD4RoIPl,
vM6SpcAX, wpe4NbSF, L_eY69Bv, Wfngq_NC, I8E7jszF]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, MVpoxopZ, UT7zhBDm, hhOmGIls,
V3KN74T9, 3B7bhdh3, yshGBwsU, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-, yoLx6Opy,
mrbkif2c, GoPTQ8dT, X4q99UL7, 9xnr1G5u, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS,
0og11Lh8, icJam_5l, 4RmwbCw4, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, eWDM9xAe,
QUNHFVVV, a_pdu29o, p7E2RtgF, sEdO8hht, J54I24Qs, MkYz16Rk, fMag1Tga, uWbnhqa5,
3hTPwXPr, nAI7nw4v, Q5dE-6Cm, ha4vQWj7, 2oWa2CKB, 2zXpXjMp, j8hvJDPs, IUq0Mmjo,
etvVmxnc, 7H0BeDsK, 1uVZjWMu, tD4RoIPl, vM6SpcAX, wpe4NbSF, hAPNuHZP, Wfngq_NC,
L_eY69Bv, I8E7jszF, BKaUtdGp, rlpoMDqy]
***********************************************************

spikefactor1000/
-------------------------------------------------------------------------------------
---
vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, A6v2mnRl, eWDM9xAe, QUNHFVVV,
cWjC2ttw, a_pdu29o, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr,
8AjyAF5f, nAI7nw4v, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj,
j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, fMhpYBOO, YJ6Vy4Nv, 0-5hUtBO, RCqwzFLD, ryviUJgy,
1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF, hAPNuHZP, Xo1Z1OwW, Wfngq_NC,
L_eY69Bv, I8E7jszF, BKaUtdGp]
a_pdu29o, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, 8AjyAF5f,
8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs, 4Qq9sGzS,
IUq0Mmjo, fMhpYBOO, YJ6Vy4Nv, 0-5hUtBO, RCqwzFLD, ryviUJgy, 1uVZjWMu, tD4RoIPl,
BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, Ni6noMmw, MVpoxopZ, uksx0FeR,
VJnRw-Af, hhOmGIls, BSgKaQmC, V3KN74T9, y3aIN6Vj, 3B7bhdh3, uiEWT3kP, wlyGmMNW,
Tiouijwe, gD-ZZmb-, yoLx6Opy, mrbkif2c, GoPTQ8dT, X4q99UL7, AFOMpCP1, 9xnr1G5u, q-
TCdiS-, Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, o1lG-yUY, f-
x9bxwm, vXaSvEK6, t4_PwAYp, tgmmUOx0, N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV,
cWjC2ttw, a_pdu29o, J54I24Qs, IlbqfHKT, gMCUOdzs, fMag1Tga, 3hTPwXPr, TFcwEw7y,
8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, wehVTVhD, 4Qq9sGzS, IUq0Mmjo,
E_h53ExD, fMhpYBOO, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, injzG32Z, wpe4NbSF,
hAPNuHZP, Xo1Z1OwW, L_eY69Bv, Wfngq_NC, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, MVpoxopZ, uksx0FeR, hhOmGIls, BSgKaQmC,
V3KN74T9, y3aIN6Vj, 3B7bhdh3, esxf9LJc, yshGBwsU, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-
ZZmb-, yoLx6Opy, GoPTQ8dT, X4q99UL7, 9xnr1G5u, Ra9F5rTD, JQNo0TMz, xoZWy8yV,
lSTmfWDS, icJam_5l, 4RmwbCw4, o1lG-yUY, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0,
N1DTyWDC, eWDM9xAe, QUNHFVVV, cWjC2ttw, a_pdu29o, sEdO8hht, Grohf5-o, J54I24Qs,
gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, 8AjyAF5f, 8eo2Gj0-, Q5dE-6Cm, 2oWa2CKB,
7H0BeDsK, RCqwzFLD, 1uVZjWMu, tD4RoIPl, vM6SpcAX, wpe4NbSF, hAPNuHZP, Xo1Z1OwW,
Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
ZZmb-, yoLx6Opy, uf3FMxIG, mrbkif2c, GoPTQ8dT, X4q99UL7, AFOMpCP1, 9xnr1G5u,
Ra9F5rTD, JQNo0TMz, xoZWy8yV, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm,
t4_PwAYp, tgmmUOx0, N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV, a_pdu29o, sEdO8hht,
Grohf5-o, J54I24Qs, gMCUOdzs, fMag1Tga, uWbnhqa5, 3hTPwXPr, nAI7nw4v, Q5dE-6Cm,
2oWa2CKB, 2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj, j8hvJDPs, 4Qq9sGzS, IUq0Mmjo,
fMhpYBOO, 0-5hUtBO, 7H0BeDsK, ryviUJgy, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM,
wpe4NbSF, L_eY69Bv, Wfngq_NC, I8E7jszF, BKaUtdGp]
x9bxwm, t4_PwAYp, tgmmUOx0, N1DTyWDC, m87V1xS7, eWDM9xAe, QUNHFVVV, a_pdu29o,
p7E2RtgF, sEdO8hht, Grohf5-o, J54I24Qs, MkYz16Rk, gMCUOdzs, fMag1Tga, uWbnhqa5,
3hTPwXPr, nAI7nw4v, Q5dE-6Cm, 2oWa2CKB, 2zXpXjMp, jLw_tYRU, wehVTVhD, YK47CRXj,
j8hvJDPs, 4Qq9sGzS, IUq0Mmjo, fMhpYBOO, 0-5hUtBO, 7H0BeDsK, ryviUJgy, 1uVZjWMu,
tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF, L_eY69Bv, Wfngq_NC, I8E7jszF, BKaUtdGp]
[1EvZOLZA, SxdL_VN6, og3XtU1p, vOTeHcnB, 7a2O2B4f, PtccHl4F, MVpoxopZ, hhOmGIls,
BSgKaQmC, V3KN74T9, 3B7bhdh3, yshGBwsU, 1xqHQw3J, wlyGmMNW, Tiouijwe, gD-ZZmb-,
yoLx6Opy, mrbkif2c, GoPTQ8dT, X4q99UL7, 9xnr1G5u, Ra9F5rTD, JQNo0TMz, xoZWy8yV,
DmzApudU, lSTmfWDS, icJam_5l, 4RmwbCw4, vXaSvEK6, f-x9bxwm, t4_PwAYp, tgmmUOx0,
N1DTyWDC, eWDM9xAe, QUNHFVVV, a_pdu29o, p7E2RtgF, sEdO8hht, J54I24Qs, MkYz16Rk,
fMag1Tga, uWbnhqa5, 3hTPwXPr, nAI7nw4v, TFcwEw7y, fBMSljcP, 8eo2Gj0-, Q5dE-6Cm,
2oWa2CKB, Z2rHUQrq, 2zXpXjMp, jLw_tYRU, YK47CRXj, j8hvJDPs, IUq0Mmjo, 0-5hUtBO,
7H0BeDsK, RCqwzFLD, 1uVZjWMu, tD4RoIPl, vM6SpcAX, 8LuebslM, wpe4NbSF, hAPNuHZP,
hDpDVBiJ, Wfngq_NC, L_eY69Bv, I8E7jszF, BKaUtdGp]
***********************************************************
6. Conclusions:
It is seen that window size 2 performs better than window size 5 and the overlap size is greater than
60 for all the dates. Changing spike factor from 1000 to 10 does not seem to have a significant effect.
The best result was for configuration
output-window2-decaypoint8-spikefactor1000/
With the intersection sizes for each day as follows:


Readme PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Readme PDF

Uploaded by

Copyright:

Available Formats

Identification of top 100 trending songs

in Saavn’s stream data

(song ID, user ID, timestamp, hour, date)

Weight = 1*(0.8)^(difference in date - 1)

2.1 Filtering of spikes in the data:

3.1 The mapper:

Mapper Key : targetDate:songID:timestamp.

3.2 The combiner:

Combiner output key: targetDate:SongID

3.3 The partitioner:

3.4 The reducer:

4. Running the program:

Run As -> maven build

Start the EC2 instance.

putty is used to obtain a login prompt to the EC2 instance as ec2-user.

From the directory where saavnproject-0.0.1-SNAPSHOT.jar is present the following command is

hadoop jar saavnproject-0.0.1-SNAPSHOT.jar saavnproject.SaavnTrendsDriver -D window=2 -D

s3a://saavn-sitaraman-output/output-proj-window2-decaypoint8-spike10 is the output folder

The results are then copied to local directory using thecommand

hadoop fs -copyToLocal s3a://saavn-sitaraman-output//output-proj-window2-decaypoint8-

5. Analaysing the output:

The output for the configurations tested is appended below:

Results for configuration: resources/output/output-window2-decaypoint8-

Results for configuration: resources/output/output-window2-decaypoint8-spikefactor10/

Results for configuration: resources/output/output-window5-decaypoint8-

Results for configuration: resources/output/output-window1-decaypoint8-

With the intersection sizes for each day as follows:

For date 2017-12-25 Intersection size with gold standard set is 65

You might also like