A Fraud Detection System For Mobile Applications

ICACCP-2019 1570503787
1
2
3
4
5 A Fraud Detection System for Mobile Applications
6
7
8
9
10 Abstract application with large number of downloads, rating and
11 reviews usually ranked high in the Leader board. There is
12 Since technology is advancing rapidly, the mobile different ways to popularize an application which are
13 app market is also growing very fast and the number of mainly divided into two categories. White hat basis
14 mobile app is increasing day by day. Every app promotion is the legal way to promote an application
15 developer wants their app to be rank as high as possible whereas Black hat basis is the illegal way for promoting
16 in the popularity list. So to maximize their popularity an application. The shady application developers
17 some of the app developers use some unfair and tricky generally use Black hat basis techniques to promote their
18 means. They use “Bot-farms” and “human water applications. They generally use some fraudulent or
armies” to download their apps and provide fraud rating tricky means for boosting up their application in the
19
and reviews. So the challenge is to detect such fraudulent popularity list. This is usually implemented by “Bot-
20
activities. In this paper we are developing such a system farms” or “Human Water Armies” which filled up
21
which can detect such fraudulent behavior by mining the application download, rating and reviews in a very short
22
apps historical records. Here we are mining three types time[2][3]. If we observe carefully we will find that an
23
of evidences on the basis of apps historical records application is not always ranked high in the leader board
24
mainly- Ranking based evidences, Review based but only in some periods called leading event from
25 different leading sessions which means fraud for mobile
26 evidences and Rating based evidences. Then we are
calculating aggregation using these three evidences. applications particularly occur in this session. So
27 detecting frauds in application is nothing but finding
28 fraud in leading sessions. Therefore first we need to find
29 Index Terms – Ranking based evidences, Ratting based
evidences, Review based evidences, Mining Leading the leading session of mobile application then we will
30 evaluate those sessions with applications historical
31 Session, Sentiment analysis.
records. The main objective of our work is to detect fraud
32 behavior for mobile applications by mining the apps
33 historical records. We will check whether there is any
34 Introduction
fraud signature in any leading sessions. We first collects
35 In today’s world technology is advancing
three types of evidences from user feedback namely 1)
36 expeditiously. Mobile device is a part of this technology.
Ranking based evidences, 2) Rating based evidences and
37 The number of mobile user is increasing day by day. 6.2
3) Review based evidences. Since this project mines user
38 percent population of the world has mobile devices. So
feedback so we consider two types of user feedback,
39 the mobile application is also a well known concept. Till
rating based and review based. We generally rate an app
40 now there is over 3.6 million applications in Google play
while downloading it or after seeing its performance so
41 store and 2.2 million applications in Apple app store. The
rating is one of the important evidence to judge the app
42
number of applications developers is also increasing
but as we discussed above that there are some techniques
gradually. So there is huge competition among the
43 with the help of which we can increase the rating [2].
applications developers. As there is massive number of
44 Many people download applications after reading user
applications, for us it is a bit difficult to choose right
45 reviews so shady application developers may inflate their
applications. Every applications developers wants their
46 applications with fake comments. So here we have
applications to be popular so that they can get maximum
47 designed a system that will detect whether any such
number of downloads and thus get maximum revenue
48 activities is done to increase the popularity of the
from it. Applications leader board is such platform from
49 application. We will first determine the active periods.
where we can categorize how much popular an
50 We have designed an algorithm to find the active periods.
application is. These Leader boards are the best way to
51 By mining the applications historical ranking records we
popularize an application. A top most position in the
52 get the leading events and then combining the adjacent
Leader board indicates that the application is popular.
53 leading events we get leading sessions. We then need to
Top ranked applications generally have more numbers of
54 evaluate these leading sessions against three types of
downloads and earns in million dollar. So application
55 evidences. We first divide the statistical and textual
developers have a tendency to investigate different ways
56 reviews. For particular leading session we mapped these
to get higher position in the leader board [3]. An
57 review. For textual reviews we will apply Natural
60
61
62
63
64
65
1
1
2
3
Language Processing (NLP) and get the sentiment of the in the leading sessions. The fraud application developer
4 reviews. Then for each leading session we determine the uses some tricky or unfair means for leading their apps
5 overall sentiment and check whether there is any ranking in the leaderboard. Detection of such apps is
6 anomaly pattern. done by making leading sessions from the leading event
7 which shows the phases of achievements. The rising
8 phase, maintaining phase and recession phase in which
9
Review of Literature
Patil Rohini et al., [1] discussed that almost everyone we find apps ranking behavior from historical ranking
10 records. When those phases are checked over time the
uses mobile phone these days and they use mobile app
11 ranking of the genuine apps constantly maintains over
store. We can get number of applications from these
12 time periods but the fraud apps fluctuates over time.
applications store but there may be some application
13 Therefore we have to characterize some fraud evidences
which may be used for data robbery. So such application
14 from apps historical ranking records. We can consider
should be detected and make identifiable the users. They
15 have proposed a web application that will process the two other types of evidences based on apps historical
16 applications historical records with different techniques reviews and ratting.
17 which will give results in graph forms. From the graphs In ranking based evidences each leading events must
18 then the comparison will be made between the shows a specific ranking pattern from the history of the
19 applications. apps ranking behavior [10].
20 Ranjita.R et al., [13] refer that ranking fraud is the key In ratting based evidences, when app is published, the
21 challenge in the mobile application market. According to user after downloading the app can rate the app. The user
22 them ranking fraud are the fraudulent or vulnerable ratting is one of the most important advertising way of
23 activities which have a purpose of bumping up app in the applications. Apps having higher ratting attract more
24 popularity list. number of users for downloading the apps and it rank
25 Hengshu Zhu et al., [2] give us the idea of mining high on the leaderboard. Therefore ratting is an important
26 active periods namely the leading sessions of the evidence for ranking fraud.
27 applications. They also identified various types of In review based evidences similar to the ratting apps
28 evidences mainly rating based evidences, review based store also allows user to write their feedback as apps
29 evidences and ranking based evidences. Hengshu Zhu reviews. User gives their experiences with the particular
30 and his partner also proposed an aggregation method mobile apps with the help of reviews. So review also
31 based on optimization to integrate all the different types plays an important role in ranking fraud. Finally the
32 of evidences. evidence based aggregation method is use for integrating
33 Shivkumar Swami N et al., [4] discussed that for all the evidences [11].
34 detecting fraud data mining techniques can be used. They
35 discussed different techniques that can use to detect the Proposed Approach
36 anomaly in datasets. They also give a brief description We will first read the datasets and after preprocessing
37 about some of those techniques. it we will separate the statistical reviews and the textual
38 Abhilash TP et al., [3] give us the overall idea of reviews. The statistical reviews will then be mapped in
39 ranking fraud. They first showed us the concept of active sessions. Each session will be checked separately. If it is
40 periods for mobile application. They showed that an found that the sessions are evenly organized then chances
41 application is not always ranked high in the leader board of the reviews being fake is less but if it is abruptly
42 but only in some active periods known as leading organized the chances of the reviews being fake is high.
43 session. They also give the basic idea how we can get For example for session S1 the mean review is excellent
44 active periods by mining applications historical records. but for session S2 the mean review certainly drops it
45 Farther they investigate three types of evidences through means the reviews in session S1 are not genuine and
46 statistical hypothesis test. might be paid reviews. After completing the statistical
47 L. Velmurugan[5] gives the basic idea about the reviews we will consider the textual reviews and apply
48 techniques used in misuse detection and anomaly NLP on these reviews. The NLP process consisting of
49 detection. This paper also gives an overview of mining two parts, Parts of Speech tagging that will find Parts of
50 the leading session. It also gives an algorithm on Speech of each input words and Chunking that will
51 fraudulent ranking behavior detection with Concept remove all the unnecessary Parts of Speech from the
52 Vector Based Review Evidence Analysis (CVBREA). reviews and will gives the action words only. We will
53 process all these action words and will determine the
54 Proposed System overall sentiment. Then we will check all these
55 sentiments session wise to find whether the reviews are
The objective of our work is to find fraudulent ranking
56 fake or genuine. Composite results of both the statistical
behavior for mobile application. Fraud generally happens
57
60
61
62
63
64
65
2
1
2
3
reviews and textual reviews will identify the true nature
4 of the review and will generate the results.
5 Advantages of proposed system- the proposed
6 framework can be extensible and can be continued by
7 considering other evidences for ranking fraud detection.
8
9
10
Proposed System Architecture
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 Fig1: Block diagram of System Architecture
30
31 Mining Leading Session
32 As fraud usually fraud usually happens in leading
33 session. So leading sessions are the base for detecting
34 fraud in mobile apps[13][8]. There are mainly two steps
35 for mining leading sessions. First we need to determine
36 leading events from applications historical ranking
37 records. Second we need to merge adjacent leading
38 events for getting a leading session [9]. Observation
39 shows that an application is not always ranked high in the
40 leader board but only in some specific time called leading
41 events. The fraudulent mobile application generally poses
42 different ranking patterns in each leading sessions
43 compare to normal applications. Therefore the problem
44 of identifying fraud behavior in mobile application is to
45 find out vulnerable leading sessions [3].
46 Leading sessions are the active periods for mobile
47 applications. Our first task is to find leading session for
48 mobile applications. We have used an algorithm for
49 mining the leading sessions from applications historical
50 ranking records.
51 The pseudo code of mining leading session for an
52 application a:
53
54
55
56
57
60
61
62
63
64
65
3
1
2
3
Identifying evidences a very important factor for detecting fraud behavior in
4 mobile application. Reviews are usually given in natural
Ranking based evidences:
5 language so these reviews need to be preprocessed with
6 Natural Language Processing (NLP)[6][7].
Leading events are the active periods for mobile
7 Preprocessing of reviews
applications. First we should study the basic significance
8 1. Tokenization: It is the process of breaking a stream of
of leading events to obtain fraud evidences. By analyzing
9 text into words, phrases, symbols or meaningful elements
the Apps’ historical ranking records, we observe that
10 Apps’ ranking behaviors in a leading eventual ways called as tokens.
11 satisfy a specific ranking pattern, which consists of three
12 different ranking phases, namely [9][12] 2. Stop word removal: Stop words are commonly used
13  Rising phase: words such as a, the, and, for, from, is, in etc.
14
15 3. Stemming: For finding the root word stemming is
 Maintaining phase:
16 done. In English language Porter stemmer algorithm can
17  Recession phase: be used for removing suffixes to the stem word.
18
Rising phase: When an application ranking increases to a We have to find the overall user reviews then we have
19
top position in the leader board is called rising phase. to mapped these reviews in session and will check its
20
fraud behavior. We have created a module for finding the
21
Maintaining phase: The period for which an application overall sentiment of the reviews.
22
23 remains in its top position is known as maintaining
phase. Block diagram of sentiment analysis module
24
25
26 Recession phase: The phase of declination of the apps
27 rank is called recession phase.
28
29
30
31
32
33
34 Ratting based evidences:
35
36 After downloading an app users generally rate the app.
37 The rating given by the user is one of the most important
38 features for the popularity of the app. The shady
39 application developers usually inflate their apps with fake
40 ratting so that they can get maximum number of
41 downloads. Thus rating based evidences is also important
42 factor that need to be considered. Generally, ratings are
43 between one to five, here we consider a threshold values
44 to classify the rating into two parts. The rating which are
45 less than or equal to three are considered as negative
46 ratings and rating above three are considered as positive Fig2: Block diagram of sentiment analysis module
47
ratings.
48 Algorithm:
Review based evidences: 1. Read all feedback information
49
50 2. Divide the information into sessions
User gives their feedback for a particular application 3. For each session find the feedback obtained, to get the
51
after downloading the application or after experiencing list S1 F1
52
its performance. Many people go through these S2 F2
53
feedbacks before downloading an application. So fraud S3 F3
54
may happen in user reviews also. Shady application …
55 developers may inflate their application with falls
56 …
comments or reviews. So review based evidences is also
57
60
61
62
63
64
65
4
1
2
3
… Then to check how the review changes in each session
4 Sn Fn Where Si is the session, and Fi is the feedback we have use K-means clustering. We made two clusters
5 from that session and calculated the distance between the two clusters.
6 4. Check if the feedbacks have a common trait, if(F1 =
7 F2 and F2 = F3 and .... Fn-1=Fn) Then it means the
8 review is genuine else if there is a abrupt shift in the
9 pattern, then the feedback might be non-genuine For NLP
10 based technique, Fig 17: Centroids of Clusters
11 1. Read all feedback information
12 2. For each feedback, find action words using POS Cluster of all the session
13 Tagging and Chunking process
14 3. Evaluate the sentiment from the feedback and mark the
15 feedback as Good or Bad
16 4. Divide the feedback into sessions
17 5. For each session find the feedback obtained, to get the
18 list S1 F1
19 S2 F2
20 S3 F3
21 .
22 Sn Fn, Where Si is the session, and Fi is the feedback
23 from that session
24 6. Check if the feedbacks have a common trait, if(F1 =
25 F2 and F2 = F3 and .... Fn-1=Fn) Then it means the
26 review is genuine else if there is a abrupt shift in the
27 pattern, then the feedback might be non-genuine.
28 Combined results from both the algorithms to conclude if
29 the given feedback is genuine or not. Fig4: Cluster of sessions
30
In the above clusters we can see that there is only one
31 Results and Disscussions element in second clusters, which is far from the first
32 We have collected historical records of the
cluster that indicates that there is a shift in the user
33 applications. By mining the historical raking records we
reviews. Thus there may be some fraud signatures in the
34 have got the leading session.
user reviews in that session. We can now consider
35 An application with 3 months historical ranking
different applications and made clusters of the sessions
36 records with k=180 are given in the following graph:
then after putting their centroids and distance between
37
the centroid in tabular can compare the results.
38
39
40 Conclusion
41 We have designed a system for detecting fraud
42 behavior in mobile applications. Many shady application
43 developers use some unethical method to increase the
44 popularity of their application. Here we have showed that
45 this type of fraud in mobile applications generally
46 happens in the active period that is leading session of that
47 application. Here we have designed an algorithm for
48 Fig3: Leading sessions mining those leading sessions from applications
49 historical ranking records. The system aims to detect
50 Overall ratings and overall reviews are calculated for frauds based on three types of evidences, such as ranking
51 all the sessions. Then we have lists the overall rating and based evidences, rating based evidences and review
52 overall review sentiment for all the sessions. based evidences. Further, an optimization based
53
aggregation method combines all the three evidences to
detect the fraud. A unique perspective of this approach is
54
that all the evidences can be modeled by statistical
55
hypothesis tests, thus it is easy to be extended with other
56 Fig 16: Overall User Review for all the session
57
60
61
62
63
64
65
5
1
2
3
evidences from domain knowledge to detect ranking [10]. Anuja A. Kadam ,Pushpanjali M. Chouragade, “A
4 fraud. Review Paper on: Malicious Application Detection in
5 Android System”, International Journal of Computer
6 Applications (0975 – 8887) National Conference on
7 References
1. Patil Rohini, Kale Pallavi, Jathade Pournima, Prof. Recent Trends in Computer Science & Engineering
8 (MEDHA 2015).
9 Pamkaj Agarkar, “MobSafe: Forensic Analysis for
10 Android Application and Detection of Fraud apps Using [11]. L. Velmurugan, “Latent Relation Analysis based
11 Cloud Stack and Data Mining”, International Journal of Discovering Fraudulent Ranking Identification on Mobile
12 Advance Research in Computer Engineering and Web Apps”, Indian Journal of Science and Technology,
13 technology (IJARCET) Volume 4, Issue10, Octobor Vol 8(34), DOI: 10.17485/ijst/2015/v8i34/74505,
14 2015. December 2015.
15 2. Hengshu Zhu, Hui Xiong, Senior Member, IEEE,
16 [12]. Ranjitha.R, Mathumita.K, Meena.S and S
Yong Ge, and Enhong Chen, Senior Member, “Discovery Hariharan. “Discovery of Ranking of Fraud for Mobile
17 of Ranking Fraud for Mobile Apps” IEEE Transactions
18 Apps”, International Journal of Innovative Research And
On Knowledge And Data Engineering, Vol. 27, No. 1, Management (IJIREM), ISSN:2350-0557, Vol-3, Issue 3,
19 January 2015.
20 May 2016.
21 3. Abhilash T P, L Dinesha, “Ranking Detection and [13]. S. Karthika and N. Sairam, “A Naiive Bayesian
22 Avoidance Frauds in Mobile Apps Store”, International Classifier for Educational Qualification ”, International
23 Journal of Advance Networking and Application”, ISSN journal of Science and Technology, ISSN (Print): 0974-
24 NO: 0975-0282. 6846, ISSN(Online): 0974-5645, Vol 8(16), July 2015.
25
26 4. Shivakumar Swamy N, Prof. Sanjeev C. Lingareddy,
27 “Fraud Detection Using Data Mining Techniques ”,
28 International Journal of Innovations in Engineering and
29 Technology (IJIET).
30 5. L. Velmurugan, “Latent Relation Analysis based
31 Discovering Fraudulent Ranking Identification on Mobile
32 Web Apps”, Indian Journal of Science and Technology,
33 Vol 8(34), DOI: 10.17485/ijst/2015/v8i34/74505,
34 December 2015.
35
36 6. Shreya Banker and Rupal Patel, “A Brief Review of
37 Sentiment Analysis Methods”, International Journal of
38 Information Sciences and Techniques (IJIST) Vol.6,
39 No.1/2, March 2016.
40
41 [7]. Xuanfan Wu, “Matrics, Techniques and Tools of
42 Anomaly Detection: A Survey”, (Online). Available:
43 https://www.cse.wustl.edu/~jain/cse567-
44 17/ftp/mttad/index.html
45 [8]. Raghuveer Dagade, Prof. Lomesh Ahire, “Review: A
46 Ranking Fraud Detection System For Mobile Apps”,
47 International Journal of Innovative Research in Computer
48 and Communication Engineering, Vol. 3, Issue 11,
49 November 2015.
50
51 [9]. Javvaji Venkataramaiah, Bommavarapu Sushen,
52 Mano. R, Dr. Gladis pushpa Rathi, “An Enhanced
53 Mining Leading Session Algorithm For Fraud App
54 Detection in Mobile Application”, International Journal
55 of Scientific Research in Engineering (IJSRE) Vol. 1 (4),
56 April, 2017.
57
60
61
62
63
64
65
6
ICACCP-2019 1570507786
1
2
3 Long-term Static Music Emotion Recognition: A
4
supervised learning approach to model user emotion
5
6
7
8
profile for Music Recommender Systems
9
10 Rwiddhi Chakraborty* Aniket Dutta*
11 Electronics and Communication Engineering Electronics and Communication Engineering
12 Heritage Institute of Technology Heritage Institute of Technology
13 Kolkata, India Kolkata, India
14 rwiddhi.chakraborty.ece18@heritageit.edu aniket.dutta.ece18@heritageit.edu
15
16 Shubhayu Das* Chandrima Roy
17
Electronics and Communication Engineering Electronics and Communication Engineering
18
Heritage Institute of Technology Heritage Institute of Technology
19
Kolkata, India Kolkata, India
20
21 shubhayu.das.ece18@heritageit.edu chandrima.roy@heritageit.edu
22
*
23 Authors contributed equally to this work
24
25
26 Abstract—In this paper we describe and approach Static Music where different regions of the 2D plot signify different kinds
27 Emotion Recognition (MER) as a Supervised Learning problem. of emotion.
28 Here we propose a paradigm to capture the user’s emotional
tastes, using Arousal and Valence annotations from the user, for
29
various modern applications, primarily in music recommendation
30 systems. The primary aim is to predict the emotional content of
31 a piece of classical music according to the taste of a particular
32 user. We show that our Static MER model gives a satisfactory
33 performance, even with a genre as emotionally complex as
34 Classical Music. Moreover we use the entire duration of the pieces
(unlike other works in this area, which use shortened clips of the
35 pieces) in this problem. We obtained satisfactory results using
36 comparatively smaller feature sets - 68 features for Valence and
37 70 for Arousal. Finally, we propose an architecture for a music
Fig. 1. Russell’s Circumplex Model.
38 recommender system that can integrate this approach effectively.
39 This is the first time in this research space that a two pronged The problem of identifying emotion in music, having mul-
40 approach - smaller feature sets and entire duration of music tiple modern use cases, was taken up by various research
pieces - has been used successfully, with potential for far reaching
41 groups in the past decade, each having different approaches
commercial applications in the present day.
42 Index Terms—Music Emotion Recognition; Music Recommen- to the problem. This led to the MER regression problem
43 dation Systems; Music Information Retrieval; Supervised Learn- being further broken down into ’Static MER’ and ’Dynamic
44 ing; Support Vector Regression; Random Forest Regression; MER’. The former assumes emotion in a piece of music to be
45 Artificial Neural Networks independent of time; or in other words, each piece of music
46
has a definite emotion it aims to express and can be denoted
47
I. I NTRODUCTION by a single point in the Valence-Arousal space. The latter,
48
Dynamic MER assumes that the emotion in music changes
49
The problem of addressing Music Emotion Recognition with time, and hence each piece of music follows a contour
50
(or MER) as a supervised learning problem started about a in the Valence-Arousal space with respect to time. In the past
51
decade ago [1]. Inspired by the pioneering work of Russell decade, majority of researchers interested in Music Emotion
52
[2], MER uses a ’dimensional’ approach to model emotion, Recognition have chosen the Dynamic MER approach, with
53
rather than a ’categorical’ approach. Russell’s dimensional the hope of capturing the variations of emotional expression
54
55 model, which he called the ’circumplex model’, works on the in music. However we suggest here, that Static MER is
56 basis of two metrics, viz. Valence or pleasantness; positive or more relevant and efficient when the use cases rely on being
57 negative affecting states and Arousal or activation; energy and able to differentiate between two pieces of music, based on
60 stimulation level. Figure 1 shows the ’circumplex model’, an individuals emotional response. Under the Static MER
61
62
63
64
65
1
1
hypothesis, this is a single comparison and is particularly in the next subsection. As mentioned in the introduction
2
useful for a use case like ’music recommendation systems’. section, the use case of our work is to predict the valence
3
4 W and arousal values of a new piece of music based on the users
5 In order to suggest music to its users, music streaming taste. Our intention is not to find the absolute emotion of a
6 websites ’Pandora’, use advanced recommendation systems musical piece, in which case the problem of subjectivity of
7 which attempt to build an emotional profile of a user [3]. the annotations must be accounted for. We are not concerned
8 In one of their recent projects ’Spotify’, used ’Valence’ as with removing subjectivity from our ground truth. We want to
9 one of the features in their recommendation algorithm [4]. capture it, since the aim is to predict or replicate the listeners
10 The emotional classification of a piece of music in such response itself (in terms of Valence and Arousal), for pieces
11 applications could be done by in-house music experts, which outside the training set. Hence the entire the entire ground
12 would be subject to the problem of subjectivity. Or this could truth data had been acquired from a from a single person.
13 be done using a machine learning approach to build a listener’s Our data-set being that of classical music pieces, the labeling
14 emotional profile to cater to each listener emotional taste was done by a professional classical musician, teacher and
15 separately. In this work we break down the problem of building conductor, Mr. Anubrata Ghatak [7].
16 a listener profile for such applications, to a simple prediction B. Feature Extraction
17 problem. The task is to show that with a reasonable number of
For our audio feature extraction of, we have used the pyAu-
18 Valence and Arousal annotations of different pieces of music,
dioAnalysis [6] library. Table 1 lists each of the features we
19 a regression algorithm could satisfactorily predict a listener’s
have used. Furthermore each of these features can be classified
20 response to new pieces of music.
21 What is novel about our present approach is that here we
as follows :
• The time-domain features (Features 1 to 3 in
22 attempt to train a regressor to predict the Valence-Arousal
23 values of entire pieces of classical music (about 10-20 minutes Table 1) are directly extracted from the raw sig-
24 long) according to the taste of the user. Moreover the feature nal samples. These features encompass information viz.
25 set we use is considerably smaller compared to that in other Loudness, Noise, Energy, Abruptness, etc.;
26 MER works. This ensures easy and efficient integration with • The frequency-domain features (Features 4 to
27 use cases, that require the prediction of a piece’s Valence and 34 in Table 1, apart from the MFCCs) are
28 Arousal values according to a user’s taste. based on the magnitude of the Discrete Fourier Trans-
29 The rest of this paper is organized as follows. First, in form. The cepstral domain (used by the MFCCs or ’Mel-
30 section two we present the data-set we used for this work and Frequency Cepstral Coefficients’) results are found by
31 how we acquired our ground truth annotations. Subsequently, applying the Inverse DFT on the logarithmic spectrum.
32 here we also discuss the feature extraction paradigm used These features encompass information regarding Timbre
33 to define each piece of music. In section three we discuss Texture, Tonality, Harmony, Multiplicity (or number of
34 the pre-processing and the model training stage. In section pitches heard) etc.;
35 four, we present and analyze our results, and in section five • Features(35 and 36) are tempo related features -
36 our results are summarized and compared with other related these encompass information regarding the tempo and to
37 works. Section six briefly discusses the use and relevance of some extent the overall dominant rhythm.
38 this approach if implemented on a larger scale with respect to Further details about each feature are available on the
39 a music recommendation system. pyAudioAnalysis - feature extraction documentation in [6].
40
41 II. DATASET AND M ETHODOLOGY TABLE I
42 The data used in this work has been created by us and L IST OF F EATURES USED
43 consists of ground truth labels from only one listener, and is Table Feature name
44 1 Zero Crossing Rate
available at Github
45 2 Energy
[https://github.com/ShubhayuDas/StaticMER dataset] , and 3 Entropy of Energy
46
archived in Zenodo [https://doi.org/10.5281/zenodo.1283520]. 4 Spectral Centroid
47 5 Spectral Spread
We created this dataset using two publicly available and
48 6 Spectral Entropy
open source repositories [5] and [6].
49 7 Spectral Flux
8 Spectral Rolloff
50 A. Music Recordings and Ground Truth acquisition 9-21 MFCCs or Mel FrequencyCepstral Coefficients
51 22-33 Chroma Vector
The music data used in this work is the open source
52 34 Chroma Deviation
MusicNet [5] data-set, created by researchers at the University 35 BPM Rate
53 36 BPM Dominance
of Washington. It contains 330 classical music recordings by
54
55 famous composers like Bach, Schubert, Mozart, Beethoven
56 etc. We shall extract relevant musical features in a long term There are two algorithmic stages involved in the long term
57 basis from these recordings to train and test our regression audio feature extraction, for features 1 to 34 (see Table
60 models. The feature extraction process is explained in details 1) :
61
62
63
64
65
2
1
• Short-term feature extraction is carried out first. It splits based on the audio features used : (1)Temporal-Spectral-
2
the input signal into short-term windows (or frames) and Rhythm features Dataset(330X70), (2)Temporal-Spectral Fea-
3
4 computes a number of features for each frame. This tures(330X68), (3)With Fisher-Score Feature Selection on the
5 process leads to a sequence of short-term feature vectors Temporal-Spectral-Rhythm Dataset (330X60).:
6 for the whole signal. We have used a short-term window • (1) Using Temporal-Spectral and Rhythm/Tempo fea-
7 size of 50 ms and step size 25 ms. So, if one short tures : Rhythmic and Tempo features (Features 35
8 term frame starts at 1.00 sec and ends at 1.05 sec, the and 36 in Table 1) was chosen over and above
9 next starts at 1.025 sec and ends at 1.075 sec. Hence Spectral and Time Domain Features(Features 1-34
10 there is a 50 percent overlap. This extracts the set of 34 in Table 1). Hence the data-set taken as input was a
11 (Features 1 to 34 in Table 1) features from 330 X 70 feature matrix.
12 each short term frame of the recording. • (2) Using Spectral-Temporal features : Here only Spec-
13 • Then a Mid-term window and step is specified. For each tral and Time Domain Features (Features 1 to 34
14 segment, after the short-term feature extraction is carried in Table 1) with their mean and standard deviation
15 out, the feature sequence from each mid-term segment is was chosen. Hence the data-set taken as input was a 330
16 used for computing feature statistics (e.g. the mean and X 68 feature matrix.
17 standard deviation of the ZCR). Therefore, each mid-term • (3) With Feature Selection : Here Generalized Fisher
18 segment is represented by mean and standard deviation Score was use to for Feature Selection [10], which se-
19 over each of its short-term feature segments. The mid- lects each feature independently according to their scores
20 term window we used is of 2 seconds with a mid-term under the Fisher criterion. The was feature selection was
21 step of 0.2 seconds (i.e. 90 percent overlap). applied on the 330 X 70 feature vector to select the top
22 • Only after extracting these Short-term features and Mid- most informative features (total 28 features).
23 term features, the averages are taken, in order to produce In this paper we aim to give a Model and Feature-set wise
24 one Long-term feature vector per audio file. Hence in analysis of performance. The subsequent section explains the
25 the features 1-34 mentioned above which follow this results categorized by the choice of the Feature-set.
26 paradigm, each consists of two parameters - one is its
27 mean and the other is its standard deviation over the mid-
TABLE II
28 term features. P ERFORMANCE M EASURE IN TERMS OF M EAN S QUARED E RROR , M EAN
29 We thus obtain a total of 68 features from this process A BSOLUTE E RROR , R 2 S CORE ; FOR VALENCE AND A ROUSAL WITH
30 (34 + 34). Along with the rhythm, we end up with a (330 RESPECT TO DIFFERENT REGRESSION MODELS AND THE
FEATURE - VECTOR USED
31
X 70) feature matrix for the entire set of songs. Different
32
combinations of these were used while training. We elaborate
33 Model- Arousal Valence
on this in the next section. Feature MSE MAE R2 MSE MAE R2
34
SVR1-70 0.1301 0.2953 0.2497 0.1812 0.3723 0.0452
35 SVR2-68 0.1375 0.3075 0.2074 0.1796 0.3735 0.0535
36 III. P RE - PROCESSING AND M ODEL T RAINING SVR3-28 0.1702 0.323 0.0185 0.1898 0.3305 -0.0004
37 We have chosen a Support Vector Regressor (SVR) as RER1-70 0.1368 0.3057 0.2112 0.1715 0.3745 0.0958
RFR2-68 0.134 0.3 0.2275 0.1685 0.3678 0.1121
38 our primary regression model. A Random Forest Regressor RFR3-28 0.1518 0.3134 0.1245 0.1591 0.3617 0.1616
39 (RFR) was chosen to compare the results obtained from SVR. ANN1-70 0.1754 0.3678 -0.0111 0.1999 0.4185 -0.0533
40 A two layer Artificial Neural Network (ANN) with 90 ANN2-68 0.1712 0.3556 0.0128 0.2111 0.4357 -0.1124
ANN3-28 0.1981 0.3864 -0.1423 0.1967 0.4083 -0.0367
41 nodes in each hidden layer was chosen to see if it works
42 with the problem this projects deals with. Hence there are
43 a total of three regression models. The single pre-processing
44 step we performed was to perform max-min normalization IV. R ESULTS AND A NALYSIS
45 [8] to normalize the features. On our primary data-set the The performance of the regression models are evaluated
46 in terms of Mean Squared Error(MSE), Mean Absolute Er-
330 X 70 matrix, we employed 10-fold Cross Validation [9].
47 ror(MAE) and the R2 Score.
The algorithm chooses each fold exactly once, and the 10
48 From Table 2 and Figures 2 and 3 we can draw the
folds are created randomly. The absence of repeating items
49 following inferences:
in each iteration of the algorithm eliminates the possibility
50
of overfitting, i.e. training on the same data set repeatedly to • The Support Vector Regressor tests the best for Arousal
51
obtain inaccurate results. To obtain the best results through prediction with the primary data-set (with 70 features).
52
hyper-parameter tuning, Grid Search [9] was employed while • The Support Vector Regressor also tests the best for
53
conducting Cross Validation. This method systematically de- Valence prediction with the feature selected data-set(with
54
55 fines grid of all possible combination of parameters involved 28 features).
56 and returns the best set of parameters for which the model • The Neural Network architecture consistently performs
57 gives the best outcome. We repeated this methodology for our the worst. One explanation for this would be the com-
60 three models. Moreover Our data was split into 3 catagories paratively smaller data-set we have used, since neural
61
62
63
64
65
3
1
our prediction accuracy. Most works on MER, however use
2
different performance metrics which are not comparable, and
3
4 cater to different problem statements. Moreover, many of them
5 have used different ranges for their outputs (e.g Reference
6 [11], 2013 and many other works use the range of Valence
7 and Arousal from -0.5 to 0.5). The work by [12] achieved
8 an accuracy of 40.6% and 67.4% for Valence and Arousal, in
9 terms of R-squared statistics. However, R-squared says nothing
10 about prediction error. Even with MSE exactly the same, and
11 no change in the coefficients, R-squared can be tailored to be
12 anywhere between 0 and 100% just by changing the range
13 of the independent variable(s). Moreover, R-squared does not
14 measure goodness of fit and can be arbitrarily low when the
15 Fig. 2. Mean Square Error and Mean Absolute Error values for the Arousal
Models on run on the Arousal Test Set.
model is completely correct. Hence the metric we have used
16 is Mean Squared Error, which is more suited for the emotion
17 prediction problem we are trying to address.
18 One work that is closest to ours is by Yang et. al. [13], where
19 they attempted to predict the general emotion in a piece of
20 music. However, their focus was to find the absolute emotion
21 of a piece of music, where the ground truths annotations are
22 assumed to be the general consensus. Using the circumplex
23 model and a regression approach they try to tackle the problem
24 of subjectivity. We, on the other hand embrace it, by trying to
25 predictions with respect to an individuals subjective emotional
26 taste. In their prediction task they achieved a best case per-
27 formance of 0.1798 and 0.1731 for Valence and Arousal, in
28 terms of Mean Absolute Error (MAE). Despite the difference
29 in the MAE values, be believe our prediction performances
30 Fig. 3. Mean Square Error and Mean Absolute Error values for the Valence are similar, given that the recordings we used were of varied
31 Models on the Valence Test Set. length and span about 10-20 minutes long - which is how our
32 work is primarily different from other works. Most works,
33 including to the ones mentioned above, use fixed duration of
34 networks generally perform better on large data-sets. It musical pieces in their data-set. In [14] the pieces are of less
35 may not be true for our specific problem, but is a possible than 1 minute. Moreover, the music data-set used by most
36 explanation for the poor performance. other works, are of a combination of genres of like rock, pop,
37 • Using Fisher scoring to find the top 28 features does etc., which are far less complex than classical music in terms
38 not consistently improve performance for each model of emotional content and variation.
39 type. There may be two follow ups to this - either the The novelty in this work, lies in the fact that our predic-
40 Fisher scoring is not the appropriate measure of feature tions are user specific. We use comparatively smaller feature-
41 information in this case,or the simpler explanation that descriptors as compared to other works - viz. [11] which uses a
42 decreasing features does not in fact reduce redundancy, 128-dimensional feature set, and [12] where a 98-dimensional
43 that our current feature set is sufficient. feature set was used - and yet informative enough to perform
44
comparatively for the prediction task.
45 V. S UMMARY AND D ISCUSSION
46 In this work we mapped a set of recordings’ musical and VI. F UTURE S COPE : M USIC R ECOMMENDATION
47 acoustic features to a particular listener’s emotional response, S YSTEMS
48
in terms of Valence and Arousal. The best case error in terms Having we show that a regressor can satisfactorily map a
49
of Mean Squared Error (MSE) between the actual and the piece of music to the emotional response of listener, given
50
predicted values are 0.1591 for Valence and 0.1301 in case of their honest ground truth annotations; In this section we
51
Arousal. In terms of Mean Absolute Error (MAE) our best case discuss how a such a model could help build a emotion
52
error values are 0.2953 and 0.3617 for Arousal and Valence profile descriptor which could be easily be integrated within
53
respectively. The predictions made by the algorithm reflect the existing Music Recommender Systems - one that encapsulates
54
emotional taste of the listener or one who provides the ground the user’s emotional tastes in music, as well as predicts the
55
truth. region a new piece if music may lie. Ones profile could be
56
57 Even though the definition of our problem is different described as a hash table of Song Id and a 2-D vector of
60 form typical works on Static MER, we wanted to compare Valence nad Arousal annotations from the user (see Figure
61
62
63
64
65
4
1
4). The feature set being what describes the music, a varied [3] Howe (2009). ”Pandoras Music Recommender”, a case study. Available
2 at :
distribution of musical pieces when mapped to the Arousal-
3 https://pdfs.semanticscholar.org/f635/6c70452b3f56dc1ae07b4649a80239
4 Valence space using a regression model as presented in this afb1b6.pdf (online)
5 work, would adequately represent a users musical tastes. The [4] K. Tiffany, (2018). TL;DR: You can now play with Spotifys recommenda-
tion algorithm in your browser; Its fun if you know what valence means!
6 hash table could therefore be integrated with existing music Available at: https://www.theverge.com/tldr/2018/2/5/16974194/spotify-
7 recommendation algorithms, where arousal and valence are a recommendation-algorithm-playlist-hack-nelson (online)
8 part of user ratings. [5] J. Thickstun, Z. Harchaoui, S. M. Kakade, (2017). Learning Features
of Music from Scratch. International Conference on Learning Repre-
9 sentations (ICLR), University of Washington., in press. Retrieved from
10 https://homes.cs.washington.edu/ thickstn/musicnet.html
11 [6] T. Giannakopoulos, (2015). pyAudioAnalysis: An Open-Source Python
Library for Audio Signal Analysis. PloS one. 10, 12, in press. Retrieved
12 from https://github.com/tyiannak/pyAudioAnalysis
13 [7] A. Ghatak, n.d. Classical Musician, Conductor and Teacher at Kolkata
14 Music Academy
[8] MinMaxScaler (n.d.). Retrieved from
15 http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Min
16 MaxScaler.html (online)
17 [9] GridSearchCV (n.d.). Retrieved from
http://scikit-learn.org/stable/modules/generated/sklearn.model selection.
18 GridSearchCV.html (online)
19 Fig. 4. An example of the hash tables of tow different users.
[10] Q. Gu, Z. Li, J. Han, (2012). ”Generalized Fisher Score for Feature
20 Selection”. CoRR, abs/1202.3725, in press.
21 Such a model would facilitate suggestions based on emo- [11] F. Weninger, F. Eyben, B. Schuller, (2013). The TUM Approach to
the MediaEval Music Emotion Task Using Generic Affective Audio
22 tion. The similarity of users could be quantified based a Features. In proceedings of MediaEval 2013 Workshop, October 18-19,
23 cosine similarity of the Arousal and Valence ratings of the 2013, Barcelona, Spain, in press.
24 songs heard and rated by both users. Moreover, a personalized [12] R. Panda, B.Rocha, R. Paiva, (2013). ”Dimensional music emotion
recognition: Combining standard and melodic audio features”. In Pro-
25 prediction model for each individual listener as presented here, ceedings of International Symposium on Computer Music Modelling &
26 could directly be used to recommend new pieces of music. One Retrieval, in press.
27 could analyze the common patterns in the Arousal and Valence [13] Y. H. Yang, Y. C. Lin, Y. F. Su, and H. H. Chen, (2007). ”Music Emo-
tion Classification: A Regression Approach”. 2007 IEEE International
28 values of consecutive songs, and make new suggestions previ- Conference on Multimedia and Expo, in press.
29 ously unheard by the listener. Suggestions made with such [14] A. Aljanaki, F. Wiering, & R. , C. Veltkamp, (2015). ”Audio segmenta-
30 an approach, may seem completely unrelated to a random tion based approach for improved emotion recognition”. In proceedings
of the 16th ISMIR Conference, Malaga, Spain, in press.
31 observer, but is likely to be relished by the particular listener.
32 Moreover, intelligent approaches could be used for deciding
33 which set of songs give a unique direction to the users music
34 emotion tastes. One approach would be to cluster the songs in
35 the Arousal-Valence space and randomly suggest songs from
36 the less probable clusters to enforce serendipity. After getting
37 the Valence and Arousal ratings, one could update the hash
38 table if the values of Valence and Arousal provided are within
39 a close threshold of the average of his major cluster (major
40 cluster being the cluster with the highest probability of being
41 assigned).
42 This paradigm of modeling one’s music emotion prefer-
43 ences, opens up quite a few number of options for music
44 recommender systems to include emotion in their analysis. The
45 effectiveness of which would be based on the performance
46 of the emotion prediction model for each listener. We thus
47 appeal to the music emotion and music recommendation
48 system research community, that a benchmark database with a
49 framework for efficient ground truth collection from multiple
50 users, would go a long way in the growth of this unique
51 paradigm of emotion based music recommendation system.
52
53 R EFERENCES
54
[1] Y. H. Yang, Y. C. Lin, Y. F. Su, AND H. H. Chen, (2008). A regression
55 approach to music emotion recognition. IEEE Trans. Audio, Speech
56 Lang. Process. 16, 2, 448457., in press.
57 [2] J. A. Russell, (1980). A circumplex model of affect. Journal of Person-
ality and Social Psychology. 39, 6, 11611178., in press.
60
61
62
63
64
65
5
ICACCP-2019 1570508889
1 Cloud-based Lecture Capturing System

2
3
4 Abstract—This paper examines an innovative approach to · Capturing the audience and focusing on a guest
5 enhance current e-learning procedures, particularly in speaker in real-time helps the remote logged in students to
6 universities. The “Lecture Capturing System” is a cloud-based experience the live classroom environment.
7 web application which uses enhanced techniques to provide an · Biometric authentication of the remotely logged in
8 interactive e-learning experience to users of the system. It uses a users via facial recognition to act as a secondary layer of
facial recognition-based authentication process to allow remote
9 users to log in to the system. A Pan-Tilt-Zoom (PTZ) IP camera security and to mark the attendance of remote students.
10 captures and tracks the lecturer during the lecture session and
11 this is streamed live to remotely logged-in students. The lecturer LCS stands itself unique from existing systems by being a
12 can also share the computer screen if required. The camera comprehensive product that includes the above-mentioned
13 intelligently identifies specific gestures performed by the features all in one.
14 lecturer to rotate with the aid of gesture analyzing algorithms. II. LITERATURE REVIEW
15 Attendance of remote online students is marked automatically
during a live-streaming lecture by using multiple facial There have been many research efforts done to address
16
recognition processes executing on the server. Offline recording the needs of smart e-learning systems. Below are some of the
17 of lectures is also supported after which the video is split into a
software functionalities and technologies that have been done
18 series of chapters/thumbnails and the audio is converted to text;
prior to our research. Undertaking a Literature Survey helped
19 each chapter representing a presentation slide and the relevant
us with finding and coming up with the following.
20 text. Bandwidth and quota are managed intelligently to ensure
21 the best possible transmission rate with minimum data Regardless of the enormous growth of e-learning in
22 consumption in order to avoid filling the link to capacity which education and its perceived benefits, the efficiency of such e-
23 would result in network congestion and poor performance of the learning systems will not be fully utilized if the students are
network. This system is revolutionary and is capable of taking
24 e-learning to the next level as it provides a complete classroom not inclined to accept and use the system. “Use of E-
25 experience and much more to the remote users. It also has the Learning”, a research was done to find University students’
26 ability to support multiple enterprise customers. purpose to use e-learning [1]. Discoveries indicate that the
27 content of e-learning and self-efficacy have a positive impact
28 Keywords—PTZ camera control, gesture detection, biometric and substantially associated with perceived usefulness and
29 authentication and attendance, video thumbnails creation, student satisfaction.
30 bandwidth and quota management The tracking of the object and control of the camera is
31 I. INTRODUCTION handled by one computer in real time. The main contribution
32 of the paper [2] is a method for target representation,
E-learning has become one of the newest trends not only
33 localization, and detection, which takes into account both
in the educational sector but also in businesses. As a result,
34 foreground and background properties, and is more
students tend to prefer e-learning than being physically
35 present in a lecture due to various issues such as manually discriminative than the common color histogram based back-
36 taking down notes, inability to instantly understand the projection.
37 content in the lecture, long-distance travel time to get to the
38 lecture, and etc. There are times when a student can miss an Online body tracking by a PTZ camera has been done
39 important lecture due to various reasons, and never be able to before to automatically track a single person and focus on that
40 catch up. person [3], [4]. Online human body tracking method by an IP
41 PTZ camera based on fuzzy-feature scoring was done. At
Key Features of Lecture Capturing System: every frame, candidate targets are detected by extracting
42
43 · Automatically focusing on the lecturer in real-time. moving targets using optical flow, a sampling, and
44 · appearance. The target is determined among samples using a
Sharing lecturer’s computer screen with the students fuzzy classifier. Results show that the system has a good
45 so that they can see what the lecturer is doing on his/her
target detection precision (> 88%), and the target is almost
46 computer such as coding or annotating a PowerPoint slide.
always localized within 1/4th of the image diagonal from the
47
image center [3].
48 · Lecturer communicating with students either by
49 voice/video on request of the student so that everyone can see Remote controlling of the PTZ camera system for lecture
the conversation and clear any doubts regarding the question
50 rooms [5]. This consist of a simple and inexpensive software
of the student.
51 solution for remote management of PTZ camera systems.
52 · Generating readable text content after the lecture by This provides the ability for users to remotely control the PTZ
53 intelligently converting the lecturer’s voice into text is a camera system from one place with the simultaneous image
54 helpful feature as it helps students revise the lecture more capturing ability. But this software solution does not support
55 efficiently. real-time tracking of a person, just several predefined presets
56 · Video thumbnails of the chapter make it easier for so that feature can be improved to real-time operation in LCS.
57 students to watch the relevant part they neede without going OpenTrack - Automated Camera Control for Lecture
60 through the whole video. Recordings [6] records lecture sessions automatically without
61 the need for a human camera person. A Tabletop Lecture
· Intelligent bandwidth management to make sure that Recording System [7]. This research presents a lecture
62 we use the least possible data bandwidth to transfer the videos
63 to the students. recording system that employs gestures and digital cameras
64 to facilitate remote distance teaching. Virtual Cameraman [8]
65
1
1
2 uses two PTZ cameras having different utilities. One is III. SYSTEM ARCHITECTURE
3 named full-shot PTZ camera and the other is movement PTZ The system is a cloud-based web application which is
4 camera. capable of supporting multiple enterprise customers. Remote
5 Real-time person tracking has been implemented before, students can access the system after logging in via biometric
6 but not quite as what we have done; real-time broadcasting of authentication(facial recognition). A PTZ IP camera provides
7 the footage without any delay. The complete package of a continuous video stream of the lecturer and audience to the
8 having the lecture capturing along with audience if necessary, central server via the Kurento media server [18] during a
9 screen sharing, Face Recognition based remote login and lecture session. The server broadcasts this stream live to the
10 attendance marking for online participants, viewing the remote students. The lecturer’s screen is also shared if
11 lecture in real-time with added features such as intelligently required. The attendance of remote students is marked by
12 generating chapters on the video according to the lecture capturing their faces through their webcams and running a
neural network algorithm on the server for identification.
13 slides played alongside with this makes Lecture Capturing
Lecturers are also able to do an offline lecture recording and
14 System a perfect complete package of e-learning. A thorough then upload it to the server where this will be split into
15 research related to e-learning systems has led to the chapters and the audio to text. Figure 1 shows the system with
16 identification of some of the most influential factors used in the main components and their interactions.
17 the field of information systems research. More specifically,
18 characteristics as well as the limitations, weaknesses, and
19 strengths of web-based learning systems. Student variables,
20 such as technical issues and adapting to the new ways are
21 important variables that influence student learning, especially
22 in a collaborative e-learning environment. In particular, this
23 research helps to better understand the characteristics of
24 students and to comprehend what the students expect from
25 the learning management systems. This can help the
26 developers achieve the most effective deployment of such
27 systems and also helps them improve their strategic decision
28 making about technology in the future, they can decide on the
Figure 1: System Architecture
29 best approach that fit their students before implementing any
30 new technology. IV. METHODOLOGY
31 Features of a set of commercially available e-learning
This section describes how the system was designed and
32
platforms were compared with the Lecture Capturing System. implemented explaining the process of each functionality,
33
their flow in the system, and how they interact with each other.
34 Panopto is an easy-to-use video platform for training, The system was implemented using cutting-edge technologies
35 presenting, and communicating that enables users to record such as Nodejs, Python, ReactJS, and MongoDB for storage.
36 videos and rich media presentations and push out to
37 A. Face Recognition based authentication
subscribers in many different formats. This is mainly focused
38 A student or a lecturer can login to the system using the
39 on recording and later on pushing the recorded stream into webcam. Initially, the administrator of the Lecture Capturing
40 the users. BigBlueButton is an open-source web System should register the user by uploading quality images
41 collaboration software utilized by education organizations for and relevant details of the user. After registering a user, the
42 e-learning and training. This enables users to conduct web- server will train the face recognition classifier with the newly
43 conferencing and share documents, audio and video files for uploaded images of a user along with the existing images of
users. Thereafter the user will be authenticated from the face
44 online learning. The software’s “whiteboard” feature allows
recognition process through the webcam only if the
45
presenters to mark valuable topics in the presentation. confidence threshold of the face recognition classifier is
46 greater than 90%. If it is less than 90%, the user will not be
Echo360 combines video management with lecture capture
47 authenticated. In terms of security, session handling will take
48 and active learning to increase student success. Echo360 place after face recognition based login.
49 keeps notes linked to class presentations and videos so that
50 students can jump straight from their own words to those of In terms of the general architecture of the face recognition
51 the instructor and replay the entire learning experience. based login function, two steps are considered:
52 Videos are uploaded and processed in real time so the 1. Detection stage
53 optimized version is available as soon as class is over. LCS system search for the face region (displayed
54 by a rectangle) in the whole video stream
Kaltura offers the broadest set of video management and
55 2. Recognition stage
56 creation tools on the market, tightly integrated with every Contrasting the face image obtained above to the face
57 LMS. From flipped classrooms to live sports broadcasts. image trained in the database, and predicting the user
60 Even though there are many e-learning applications in the registered.
61 market, there isn’t a system which handles all the necessary If the system face recognition is successful, the
62 requirements like LCS has managed to implement. recognition result will be displayed in white text inside a green
63 rectangle on the webcam feed along with the confidence
64
65
2
1
2 percentage. If failed, the system will pop a warning. Face necessary PTZ signals to pan, tilt and zoom accordingly thus
3 Recognition used in the system follows three main steps. ensuring that the lecturer’s actions are always recorded
4 without missing any detail. This video recording will be
1. Prepare training data immediately compressed ‘on-the-fly’ in order to reduce its file
5 OpenCV computer vision library, Python and Numpy are
6 used as dependencies to implement face recognition function size, then streamed live and also saved in the database for
backup purposes and viewing later. Therefore, the students
7 in the system [12]. ⁠ OpenCV provides two pre-trained and
have the choice of attending the lecture via the live stream or
8 ready to be used face detection classifiers called Haar listening to the lecture later. This is very beneficial to students
9 classifier and LBP classifier [9]. ⁠Haar Cascade classifier is since they can also attend lectures without being physically
10 used as the face recognition classifier and Local Binary present (remotely) at the lecture. During the live streaming
11 Patterns (LBP) classifier is used as the face recognition session, the system will decide which video resolution (e.g.
12 classifier to detect and recognize faces in this system. LBP is 480p, 720p, 1080p) to use for the playback at the student’s end
13 a type of visual descriptor used for classification in computer depending on the speed of his/her internet connection.
14 vision [10].⁠ The LBP classifier is used due to its main
advantages such as shorter training time, high accuracy rate in Live streaming is achieved via Kurento which is a
15
difficult lighting conditions which will be useful when WebRTC media server and a set of client APIs. During the
16 detecting faces through the webcam and computationally live streaming session, the lecturer also has the ability to share
17 simple and fast [19]. ⁠A formal description of the LBP his/her entire computer screen with the participating students
18 algorithm is given in Figure 2. if required, making certain that not even the most minute detail
19 is not missed. In the case of having low bandwidth to support
20 this feature, the lecturer has the option to disable the IP
21 cameras in order to save bandwidth. This mode of lecturing
22 provides better participation and interaction between the
23 Figure 2: LBP Algorithm Equation lecturers and students. An example of this is if a student wants
24 to ask a question, the control would be given to the particular
25 The training dataset consists of 30 images for each user student by the lecturer and the application would support
26 and each user is assigned a label (e. g. s1, s2) upon registering audio only, video only or both audio-video sources of the
27 to the system. Furthermore, this step will read all the images particular student. But the lecturer has the ability to get back
the control of the audio and video sources of the system when
28 of a person and apply face detection to each one using LBP
required.
29 classifier. Then, add each face to face vectors with the
corresponding person label extracted. Finally, the data
30 C. Lecture Capture and Movement of IP Camera
preparation step will produce following face and label vectors
31 [19]. ⁠ An IP camera will be used to track the lecturer’s
32 movement and gestures in front of the camera and produce the
33 necessary PTZ signals to pan, tilt and zoom accordingly with
34 the lecturer’s movement.
35 1. Lecture Detection
36 OpenCV Object Detection using Haar feature-based
37 cascade classifiers is an effective object detection. It is a
38 machine learning based approach where a cascade function is
39 Figure 3: Face and label vectors trained from a lot of positive and negative images. It is then
40 used to detect objects in other images. Here we will work with
41 2. Train face recognizer face detection. Initially, the algorithm needs a lot of positive
The face and label vectors returned from the data
42 images and negative images to train the classifier. Then we
preparation step (according to Figure 3
43 need to extract features from it. For this, Haar features shown
‘getImagesAndLabels’ function) will be converted to a
44 in Figure 4 are used. Each feature is a single value obtained by
Numpy array and passed to the OpenCV Haar Cascade subtracting the sum of pixels under the white rectangle from
45
face recognizer for training [19]. ⁠The statistical data the sum of pixels under the black rectangle.
46
returned from the face recognizer will be saved in a
47
YAML file.
48
49 3. Prediction
50 Once the user is navigated to the login page, the server
51 automatically detects the face from the Haar classifier,
52 predicts the face by calling the trained OpenCV Haar
53 face recognizer, returns the predicted name of the user
54 associated with the label and live streams the response
55 from the server (recognized face plot, name of the user,
56 confidence threshold) to the login page. Thereafter, the
57 user will get logged in to the system after the user clicks Figure 4: Haar features
60 the face login button.
61 Now, all the possible sizes and locations of each kernel are
62 B. Audio and video conferencing used to calculate lots of features. There are irrelevant
63 An IP camera will be used to track the lecturer’s calculations, consider Figure 5. The top row shows two good
features. The first feature selected seems to focus on the
64 movement and gestures in front of the camera and produce the
65
3
1
2 property that the region of the eyes is often darker than the helpful because most of the times when the lecturer starts to
3 region of the nose and cheeks. The second feature selected plug his laptop to the main projector in a normal classroom all
4 relies on the property that the eyes are darker than the bridge of his desktop content open tabs on web browser, everything
5 of the nose. But the same windows applied to cheeks or any is visible to the students, privacy is a concern and it’s a hassle
6 other place is irrelevant. to switch sharing the screen on and off all the time to the
lecturer. With this Easy Screen Share feature lecturer can
7
stream his webcam footage alongside with the shared screen
8 if required (If in front of the laptop blocking the main camera
9 view). This extension simply initializes socket.io and
10 configures it in a way that single audio/video/screen stream
11 can be shared/relayed over users without any bandwidth/CPU
12 usage issues. This uses RTCMultiConnection is a WebRTC
13 library that is used for WebRTC streaming [22].
14 Figure 5: Features Calculated E. Gesture-Based Camera Control
15
16 For this, we apply each and every feature on all the training When a student who is physically present in the classroom
17 images. For each feature, it finds the best threshold which will has a doubt, the lecturer has to direct the camera towards the
18 classify the faces to positive and negative. We select the audience by performing a gesture at the camera. The lecturer
has to show his hand with all five fingers unfolded. Then the
19 features with the minimum error rate, which means they are
camera will analyze the gesture and recognize it using
20 the features that most accurately classify the face and non- OpenCV and Python technologies and then turn the camera
21 face images. towards the audience so that the remotely logged in users get
22 This lecture tracking script uses OpenCV’s Haar Cascade the picture of what is happening in the classroom. Once the
23 Classifier to do the person detection task. First, it initializes a student has finished asking the question, the lecturer turns the
24 face cascade using the frontal face Haar cascade [19], [20]. camera back to the normal position by either pressing a button
25 Then it starts to detect and track the largest face it can find, if on the interface or performing a gesture at his/her webcam.
26 not tracking face or lost the tracked face again it uses Haar
F. Open Broadcaster Software (OBS) Studio plugin
27 cascade detector to detect face and then correlation tracker to
28 follow it using the dlib library. Both methods require to scan A plugin was implemented for the OBS Studio [23]
29 each the whole frame with a sliding window. The algorithm software which allows a lecturer to do an offline recording of
30 then tries to find the features of a person in each window a lecture and then upload it directly to the server.
31 position. These methods are too expensive to perform in each After recording the lecturer’s desktop screen while s/he
32 frame if we want to run our person tracker on restricted conducts a lecture, the designed plugin would upload this
33 hardware like a budget laptop. For this reason, we combine video to the remote server based on predefined settings at the
34 the person detector with a correlation tracker. The correlation click of a button. These settings can be changed by the lecturer
35 tracker expects a region of interest and starts tracking the to suit their needs (e.g. upload video now or at a later
36 pixels inside that region. In subsequent frames, it tries to find scheduled time). Once the video is uploaded to the remote
37 where the pixels have most likely moved. This is much faster server, the next level of processing should be done at the
38 and more robust than trying to find the person in each and server.
39 every frame again. G. Video Thumbnails/Chapters Creation
40 The lecturer can view the list of videos which have been
41 2. PTZ Camera Movement uploaded to the server. Out of this list, the lecturer can select
42 Open Network Video Interface Forum (ONVIF) is an a video to be converted into a series of thumbnail chapters.
43 open industry standard that provides interoperability among Each frame in the video is analyzed by the PySceneDetect
44 IP security devices such as security cameras, video recorders, algorithm [24] which is implemented using Python. The
45 software, and access control systems [21]. PySceneDetect algorithm makes use of the OpenCV, NumPy,
46 Since we are using ONVIF protocols to move the camera it and FFmpeg libraries for execution. There are two main
47 allows the compatibility from different vendor’s devices so detection methods which PySceneDetect uses:
LCS will have the support by most IP based security devices
48 Threshold-detection - Compares the intensity/brightness
manufacturers giving it an added benefit of not limiting the
49 of the current video frame with a set threshold, and triggers a
system to a specific brand of IP Cameras.
50 scene cut/break when this value crosses the threshold value.
Because of this, we have used this protocol to work with our
51 The threshold value is computed by averaging the Red-Green-
tracking algorithm to move the camera accordingly. The
52 Blue (RGB) values for every pixel in the frame, yielding a
custom functions can move the camera in any direction and
53 single floating-point number representing the average pixel
zoom in and out to focus on the lecturer as required. The
54 value (from 0.0 to 255.0).
detecting and tracking algorithm made will only call to move
55 Content-aware detection - Finds areas where the
function if a significant movement is detected only ensuring
56 it doesn’t break focus for a still movement of the lecturer. difference between two subsequent video frames exceeds the
57 threshold value that is set and then trigger a scene cut. This
60 D. Easy Screen Share allows you to detect cuts between scenes both containing
61 With the ability to share the screen either completed or content, rather than how most traditional scene detection
62 selected custom application window right from the web methods work. With a properly set threshold, this method can
63 browser and start streaming it along the main live stream even detect minor, abrupt changes [25]. This method takes in
64 makes the lectures task at ease and more efficient. This is
65
4
1
2 the threshold and minimum-scene-length in frames (optional) V. RESULTS AND DISCUSSION
3 as input parameters. In terms of face recognition-based login, the LBP
4 It compares the difference in content between adjacent classifier reported an accuracy level of 70.33% for a particular
5 frames against a set threshold/score, which if exceeded, user in terms of face detection by maintaining a shorter
6 triggers a scene cut. It checks for changes in color and training time. In contrast with the Haar classifier which
7 intensity - namely the average HSV color space difference reported 81.05% for a user and took a longer training time,
8 (difference in hue, saturation, and luminance of the frame) – LPB classifier has underperformed Haar classifier when
9 between video frames [26]. If this calculated value is very high detecting faces. Since the lecture capturing system face
10 than the preceding and following values, it means that there recognition-based login should be fast and should maintain a
11 has been a scene change. This process is repeated for the entire higher accuracy level greater than 80%, Haar classifier is the
12 length of the video clip until the entire video clip is analyzed ideal solution which is used currently in the lecture capturing
system. However, the results are derived by allocating 30
13 and all the video chapters are created.
medium quality training images captured from a webcam for
14 Following this process is the real-time speech transcription each user. Therefore, the results reported by the classifiers
15 (audio-to-text conversion) of each video chapter. First, the were not satisfactory and the accuracy can change if each user
16 audio is extracted from the video chapters in mp3 format by is allocated more high-quality images for training.
17 the FFmpeg library. Next is the speech transcription procedure
18 which is achieved via the Watson Speech-To-Text algorithm. Comparison between Haar Classifier and LBP Classifier
19 This service leverages machine intelligence to transcribe the 1. LBP Classifier is faster than Haar classifier.
20 human voice accurately [27]. The service combines 2. Haar classifier uses floats to do all the calculations
21 information about grammar and language structure with while LBP classifier uses integers.
knowledge of the composition of the audio signal. 3. LBP classifier is less accurate than Haar classifier.
22
23 Therefore, the end result would be a set of videos along 4. Haar-like features in the Haar Cascade classifier
24 with their respective audio, presentation slide, and text. These work best for frontal face detection.
5. Haar features are good at detecting edges and lines
25 videos would be stored in the database so that students can
which is effective in face detection.
26 access them any time after the lecture session to further
27 understand and clarify their knowledge. Accuracy rates mentioned in the table below are derived
28 H. Facial Recognition based Attendance Marking using the formula:
29 Using facial recognition, the attendance is marked Accuracy % = 100 - Confidence Index
30 automatically for the students who are present in the lecture The confidence index will return zero if it will be considered
31 room and also the students who are logged in remotely a perfect match in detection or recognition. If not an
32 through the Lecture Capturing System during the live ‘unknown’ label is put on the face.
33 streaming lecture session. The administrator and the lecturer Average LBP classifier Haar classifier
34 are able to view, modify and filter attendance of students. A Accuracy of 60% 90%
35 student is able to view his/her attendance with the aid of the recognition
36 filtering options available. Some noticeable advantages of this Processing time for
37 feature are that it will add an extra layer of security to the encoding and training 1.8min 2.1min
38 system to ensure that only authorized persons to gain access a user with 30 face
39 to the university’s content. A comprehensible advantage of images (in sec)
40 this method of biometric authentication of students can be
noted during the time of an online exam to verify that the
41
42
person on the other end is actually who they claim to be. Also, Testing with different face datasets from 50 – 100 range when
this feature will solve the problem of students marking training images. A laptop with i7, 8th generation, and 8GB
43 attendance for other students.
RAM is used to obtain the below result.
44
45 I. Bandwidth Management Images per user LBP Classifier Haar Classifier
46 The data size which is passed from the client to the node 50 Images 2.5min 4min
47 server and vice versa is reduced using bandwidth optimization 100 Images 3.9min 6.9min
48 techniques such as compression and clustering. The
49 administrator can monitor bandwidth using the bandwidth
50 monitoring dashboard which consists of traffic usage, system Amount of main memory which is used to execute the
51 information, CPU load, alerts to notify exceeded predefined algorithm is defined as memory used and given in MB.
52 threshold settings and attacks and much more that is accessible Algorithm LBP Algorithm Haar-Like
only to the administrator of the system.
53 Algorithm
54 J. Quota Management Memory Used 123MB 290MB
55 The administrator is able to manage the internet quota With regards to the video chapter creation feature, each frame
56 allocation for users from the dashboard. The list of users along in the video is analyzed by an algorithm for changes in color
57 with their usage statistics such as used quota and remaining and intensity - namely the average HSV color space difference
60 quota can be viewed filtered by user type (e.g. lecturer, (difference in hue, saturation, and luminance of the frame). If
61 student), month, and year. This monthly quota can be edited this calculated value is very high than the preceding and
62 by the administrator for a single user (e.g. specific lecturer’s following values, it means that there has been a scene change.
63 id) or all users of a particular user type (e.g. all students). Therefore, the video is split at this time frame. This process is
64 repeated for the entire length of the video clip until the entire
65
5
1
2 video clip is analyzed and all the video chapters are created. [3] 4] P. D. Z. Varcheie, “Online Body Tracking by a PTZ Camera in IP
Surveillance System,” Department of Computer Engineering and
3 This process was run over repeated 100-iteration cycles which Software Engineering, Station Centre-ville, Montr´eal, (Qu´ebec),
4 produced an average accuracy of 95%. Canada, 2009.
5 Previously, many methods have been introduced as e- [4] T. G. Dries Hulens, “Autonomous lecture recording with a PTZ
6 learning platforms. But this research has taken a different path camera,” presented at the Canadian Conference on Computer and
Robot Vision, Belgium, 2014.
7 by replicating a complete classroom-like experience. [5] 5] M. M. M. H. R. Jacko, “Remote control of the PTZ camera system
8 for lecture rooms,” Department of Computers and Informatics, 2015.
Live streaming with recording sessions of a lecture would
9 help all students even if they were present in the lecture itself. [6] 7] B. Wulff, “OpenTrack - Automated Camera Control for Lecture
Recordings,” IEEE International Symposium on Multimedia, 2011.
10 The ability to revise what they have missed if the student [7] C.-F. C. a. P.-C. S. Yong-Quan CHEN, “A Tabletop Lecture
11 attends the lecture late and the ability to go through a previous Recording System,” in International Conference on Consumer
12 lecture before attending the next lecture would result in a huge Electronics-Taiwan, Taiwan, 2015.
13 academic improvement. [8] Y.-T. T. S. C. a. S.-W. C, “Chiung-Yao Fang, ‘Chiung-Yao Fang,
You-Ting Tsai, Shuan Chu, and Sei-Wang Chen,’” Department of
14 Computer Science and Information Engineering, Taiwan, 2015.
15 The main purpose of this system is to offer an effective [9] “The Way Online Video Streaming Works Has Changed.” [Online].
way to help the students access learning materials and
16 Available: https://www.panopto.com/blog/the-way-video-works-
information from anywhere and to quickly recap any forgotten online-has-changed.
17 or absent lectures via the earlier recordings of the sessions.
[10] “Video Analytics & Engagement Dashboard - Panopto Video
18 Platform.” [Online]. Available:
19 A system and a method for an interactive Internet-based video https://www.panopto.com/features/video-cms/video-analytics.
20 conferencing multicast operation which uses a video [11] “Face Detection using OpenCV and Python: A Beginner’s Guide.” .
“Analytics to improve student success - Echo360.” [Online].
21 production studio with a live instructor giving lectures in real- [12]
Available: https://echo360.com/platform/analytics/. [Accessed: 15-
22 time to the participating students. The video conference May-2018].
multicasting permits students to interact with the instructor
23 [13] “Automated Student Attendance Management System Using Face
during the course of the lecture and to later browse the
24 Recognition | Ise A Orobor and Ofualagba Godswill -
recorded session without a hassle. Academia.edu.” [Online]. Available:
25 http://www.academia.edu/37437099/Automated_Student_Attendance
26 VI. CONCLUSION AND FUTURE WORK _Management_System_Using_Face_Recognition. [Accessed: 10-
27 This paper examines an innovative approach that is best Oct-2018].
[14] V. Mankar and S. G Bhele, “A Review Paper on Face Recognition
28 suited to develop a lecture capturing system that provides a
Techniques,” International Journal of Advanced Research in
29 complete classroom experience to remotely logged in Computer Engineering & Technology, vol. 1, pp. 339–346, Oct.
30 students. This system stands unique from other existing 2012.
31 products by being as a comprehensive product that includes [15] F. Ahmad, “Image-based Face Detection and Recognition.” [Online].
Available: https://arxiv.org/ftp/arxiv/papers/1302/1302.6379.pdf.
32 biometric authentication, gesture detection, live streaming of [16] “WebRTC 1.0: Real-time Communication Between Browsers.”
33 lectures, automated attendance marking, offline recording of [Online]. Available: https://www.w3.org/TR/webrtc/. [Accessed: 10-
34 lectures, bandwidth management and desktop screen Oct-2018].
35 capturing all in one. [17] P. Braun, M. Sipos, P. Ekler, and F. Fitzek, “On the Performance
Boost for Peer To Peer WebRTC-based Video Streaming with
36 This research work has been developed mainly for Network Coding,” 2017.
37 addressing the problems in Sri Lankan universities, [18] “What’s Kurento - Kurento.” [Online]. Available:
38 specifically addressing the lack of interactivity between the https://www.kurento.org/whats-kurento. [Accessed: 14-Mar-2018].
[19] “OpenCV library.” [Online]. Available: https://opencv.org.
39 lecturer and the students. Though this research focuses on
[20] “OpenCV: Face Detection using Haar Cascades.” [Online].
40 universities, it has the potential to be used in other fields such Available:
41 as business conferences. In the next stage, the research team https://docs.opencv.org/3.4.2/d7/d8b/tutorial_py_face_detection.html.
42 will be focusing on improving the accuracy of the face [21] “Onvif.” [Online]. Available:
https://www.onvif.org/onvif/ver20/util/operationIndex.html.
43 recognition and gesture detection models by testing other [22] “WebRTC Home | WebRTC.” [Online]. Available:
44 algorithms. Also, the research team will focus on minimizing https://webrtc.org.
45 bandwidth costs by testing out bandwidth optimization [23] “Open Broadcaster Software | Home.” [Online]. Available:
46 techniques. It is hoped that for any person who expects to https://obsproject.com/. [Accessed: 26-Mar-2018].
build a similar system or any other real-time system, results of [24] “Command Reference — PySceneDetect v0.5 documentation.”
47 [Online]. Available: https://pyscenedetect-
this research will be an aid and will provide insight on the
48 performance, accuracy and reliability level that can be manual.readthedocs.io/en/latest/cli/commands.html. [Accessed: 09-
49 expected with the combination of tools, technologies, Oct-2018].
[25] B. Castellano, :movie_camera: A Python/OpenCV-based scene
50 programming approach considered in this paper. detection program, using threshold/content analysis on a given
51 video.: Breakthrough/PySceneDetect. 2018.
52 REFERENCES [26] “Introduction - PySceneDetect.” [Online]. Available:
53 [1] W., “Use of E-Learning,” Universiti Teknologi. Malaysia, Johor, https://pyscenedetect.readthedocs.io/en/latest/.
[27] “Watson Speech to Text,” 28-Nov-2016. [Online]. Available:
54 Malaysia, 2018.
https://www.ibm.com/watson/services/speech-to-text/.
[2] K. Kumar and T. S. Sheng, “Real Time Target Tracking with Pan Tilt
55 Zoom Camera,” presented at the Digital Image Computing, Adelaide,
56 2009.
57
60
61
62
63
64
65

A Fraud Detection System For Mobile Applications

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Fraud Detection System For Mobile Applications

Uploaded by

Copyright:

Available Formats

ICACCP-2019 1570503787

1 Cloud-based Lecture Capturing System

You might also like