You are on page 1of 7

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

Detecting User Relevant Applications


Miss. Monika Balasaheb Kadam, Mr. Kiran Bhausaheb Sawant, Mr. Aditya Dagadu Chavan,
Mr. Vaibhav Pandurang Khedekar, Prof.Sunil Ghadge
Kjs Trinity College of Engineering and Research Pune-48, Maharashtra, India
kadammona281@gmail.com, kiranbsawant40@gmail.com, adityac780@gmail.com,
vpkhedekar@gmail.com, sunil23fri@gmail.com
ABSTRACT
In todays date mobile are the necessity for humans life. The new feature as play store is an
attractive point that helps an individual to use applications, be updated and for many more usage.
Here it is necessary to be dependent on the genuine reviews not on span reviews as it can lead to
negative way for the customer. So in this paper we define an application in play store that would
help the customer to detect weather the application is fake or decepted or not. With it the
application will also provide security for the personal information that that must not be disclosed
which is accessible or is demanded by other applications.
Index Terms : Review, rating, ranking, feature, security, spam

I.

INTRODUCTION

In this paper we propose an analysis model for the detection of the deceptiveness of the reviews of
application to detect weather the applications reviews are spam or not, to detect this we use Author
Spamicity Model (ASM Model) [2]. This helps to detect the spamcity with helps of authors behavior while
writing the reviews.
The behavior can be as review similarity, rating burstiness, writing a paragraph as review. And to get the
almost perfect and genuine reviews we apply Natural Language Processing Algorithm (NLP algorithm),
the results can be displayed in graphical format for the easy understanding of the users. Secondly we also
provide security that will be given to the private information of the things in our mobile as say contacts,
images, GPS location and many more [3]. For the security we need to extract the permission information
from the administrative privilege mobile devices i.e. root device , but currently in the market there are
user privileged devices so we need to root them by applying device specific patches. Here we also provide
the security advisor which would give advice about the applications and the permissions that the apps
hold, providing which of the permissions are risky or not.
Hence we are finally providing an application that gives the sentimental advice, security advice and
security to the mobile devices.
II. SYSTEM ARCHITECTURE
For finding best user relevant application we proposed an android application which perform analysis on
data like reviews, rank, rating etc, and also provide security to users personal information.
In this application user first enter the name of an android application which user have to download for
checking the application relevancy for the users. Then it first searches the link of that particular
application on the Google and by using that link it open that application on Google play store. After that it
retrieve information of apps like version, size, reviews, ranking, app category, rating and history of that
application.

41 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

For performing analysis on this retrieved data of an application we must have to be stored in SQLite
database. SQLite database now the data analysis will be performed. Here first analysis is performing on
reviews of an application, In this, first step is to identify genuine and spam reviews. This separation can
be done by using Author Spamicity Model. It is identify by observing behaviors of reviewers which can be
considered as a spam [2].
After performing differentiation task of genuine reviews and spam reviews now we perform sentimental
analysis on genuine reviews by using Natural Language Processing algorithm. The next rating analysis is
performed on rating of an application if overall given rating is less than three then it is take negative
result and if rating is greater than three then it is take positive result. New rank of an application is
generated on outcomes of ratings and reviews. If the outcomes of ratings and reviews is better than the
new rank of an application should be in upper top of list and if it is not means if the ratings and reviews
outcomes worst then new rank of an application should be down side of the list. Next analysis is
description analysis. Here keywords are sorted out from description given of an application. It shows
only main features of an application in the form keywords. It is very useful instead of reading the whole
description given in paragraph only read this keyword as features.

Fig 1. System Architecture


Finally after performing this all analysis outcomes which are come in the form of positive and negative,
by using K-means algorithm we makes a separate cluster of positive results and negative results. Many
installed application are theft users personal information like contacts, images, call logs, messages,
location, subscriber id, SIM serial number, device id etc. Many of the applications doesnt have any need
of this information so we provide protection to the users private data by blocking or allowing access
permission to installed applications. If after blocking of access permission an application is not working
properly, then we can also provide fake information access permission to that application.
III. IMPLEMENTATION MODULES
A. Data Parser
This is the first module, in this we fetch information of an application like reviews, rating, version,
ranking, size of an application from Google play store. We can do this task by only passing name to this
42 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

application and after clicking on search button it will fetch this whole information and stores in a SQLite
database. This is shown in figure.

B. Review Analysis
In this module we are used Author Spamicity Model(ASM) for finding the genuine review and spam
review from all data that we are fetched from play store. Following methods are used for separating
genuine review and spam review [2].
1. Content Similarity:
Posting new reviews every time required more time hence instead of typing new reviews spammer
simply copy reviews across the same application. Here we are used cosine similarity for capturing
the content similarity of reviews that are given by same person or reviewer. If we captured
maximum content similarity it will consider as spamming behavior [2].
2. Maximum Number of Reviews:
Posting multiple reviews in a single day also shows the weird behavior of spammers. Here we
calculate the no. of reviews given by single author and those reviews considered as single review [2].
3. Reviewing Burstiness:
Spammers are those users who visit dont visit site time to time in other hand genuine reviewers
use their account time to time for posting reviews. Thus it is compulsory to detect as spamming
activity. For defining review burstiness we are used activity window. If reviews are posted within
very short time then it is considered as spam reviews. However if reviews are posted in long time
then it will be considered as normal behavior [2].
4. Duplicate OR Near Duplicate Reviews:
For bump up the ratings of application spammers act as genuine reviewers, that writes many
reviews that are similar to the first reviews on the same application [2].
5. Extreme Rating:

43 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

To place a product in top list of play store for that purpose spammer gives extreme rating (1 or 5
star) to promote or demote the products. Sometimes 5-star rating scale denotes the spamming
behavior [2].
6. Rating Deviation
Review spamming result affect on rating and ranking parameters that has wrong projection either
positive or negative side to vary the true sentiment on application. It gives us idea that spammer
goes to wrong side from the normal rating [2].
We perform Natural Language Processing (NLP) algorithm on genuine review after separating it. Here
NLP is branch of artificial intelligence and computer science which concentrates on interaction between
human languages and computers. Four types of filtrations are done in NLP algorithm. The first filtration
is tokenization in which we separate out the tokens from the whole review statement. E.x. This app is
very useful is an review which is given by the user. After that each token is separated out from the
review like This , app, is, very, useful. After separating tokens we perform second filtration called
stop word filtration. After finding out the stop words they will be removed from collection of tokens like
E.x. is, are, this. Third filtration is steam filtering which is used on the collection of tokens in this
filtration we find the suffixes. E.x. ing, tion and those suffixes are removed from review. After we
perform sentiment analysis, in this filtration we found positive and negative result of that particular
review by using special keywords.
C. Rating Analysis
Manipulation of rating is very important task to decide as to detecting user relevant application. Its
manipulation is important because every user gives rating to an application when it is downloaded
and used. When user want to download any application, he first see its rating given by user and then
download it, therefore rating is very effective to publication of any application advertisement. If any
application is having highest rating, then the number of download is s also increase. Therefore result
of analysis of ratings is shown by using pie charts and graphs. As per study normal apps receives
always same type of ratings, and fake app receives high rating only in some period of time. In this
analysis we consider as if rating of that application is more than three stars then it is take as a
positive result otherwise we take it as negative result.
D. Ranking Analysis
Here we calculate new rank of that application on the basis of outcomes of ratings and reviews
modules [13]. Here we compare new generated rank with the current rank of that application. If
result is close then it is taking as positive result otherwise it is taking as negative result.
E. Aggregation
Here we combine or make separate clusters of positive results and negative results of the outcomes
of reviews, rating, and ranking analysis. This task can be done by using K-Means algorithm. This
algorithm forms separate clusters efficiently as per results.
F. Description Analysis
In this we perform analysis on features description of an application. Here we sort out only main
keywords from big description paragraphs and as result it shown to the end user. Here we can
perform comparisons of multiple application of the same category. It is also very useful to the user to
find relevant application.
G. Security

44 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

When we installing any application that time they asked for granting access permissions of users
personal information and when we grant that access permission those applications are theft users
personal information like its call log, SMS, contacts number, images, location of user etc. Many of the
application doesnt have need of this information they may be do misuse of this information. For
providing security to the end users private data we have to block those access or provide fake
information to that application. Application cannot fetch data from mobile when access is blocked.
K-means algorithm
Here we are used k-means algorithm for clustering of positive and negative results which is in the form of
positive and negative and also we minimize the errors using mean squared error function by using
following formula.
d

da

C(n)= (||ma nb||)2


a=1 b=1
where,
||ma nb || distance between ma and nb is calculated.
da denotes the data points that are present in the ath cluster .
d denotes the number of clusters center.

Steps for k-means clustering

Let A = {p1,p2,p3,.., pn} is the set of data points and


V = {q1,q2,., qc} is the set of centers.
1. Pick m as cluster centers,
2. Calculate the distance between cluster centers and each data point.
3. Whenever the data points having the minimum distance from the cluster centers then assign the data
point to the associated cluster.
4. Recalculate the new cluster centers by using following formula
mn
qn=(1/mn) pn
r=1
where mn denotes the no. of data points in nth cluster
45 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

5. For obtaining the new cluster centers distance between the each data points will be recalculated.
6. Stop the execution when the data point was not reassigned; otherwise repeat the steps from 3.
IV. FUTURE SCOPE
In future we try to define application not only for android system but also for other operating system as
like IOS and many more. In future we would plan for more strong evidences that will help in detection the
imposed application and give much better result rather than providing users an irrelevant application. In
security it would be much privacy given to the user if already system application would do all the
analysis and provide only the secured and relevant applications to the user.
V. CONCLUSION
Now-a-days there is tremendous growth in mobile phones and mobile users use application from app
store for their daily usage or entertainment. There are millions of applications available on app store.
User trust on the reviews, ranking, and number of downloads and rank of application which may be fake.
So here we are developing an mobile application that accurately detect whether the app contains
imposture or not. We recognize proof as rating, review and ranking for detecting imposture. Most
important thing of this approach is that proof of fakeness also shown in the result. In this application we
are adding new component call security which helps us to protect users personal data or information
from leakage. While installing the application we provide facility to users to allow or block access of their
personal data. Our application contains the features like accuracy of the app and protection of users data.
VI. REFERENCES
[1]

Hengshu Zhu,Hui Xiong, Yong Ge, Enhong Chen Discovery of Ranking Fraud for Mobile Apps,
IEEE Transactions on knowledge and data engineering, 2015,pp.1041-4347.

[2]

Arjun Mukherjee, Abhinav Kumar, Bling Liu, Junhui Wang, Melchun Hsu,Malu Castellanos,
Riddhiman Ghosh, Spotting Opinion Spammers using Behavioral Footprints, in KDD13,,Chicago,
Illinois, USA. pp. 978-1-4503-2174-7.

[3]

Pu-han Zang, jing-Zhe Li, Shui Shao, PengWangs, Pdroid:Detecting Privacy Leakage on Android,
applied mechanics and materials, 2014,pp.2658-2662.

[4]

H. Zhu, H. Cao, E. Chen, H. Xiong, J. Tian, "Exploiting enriched contextual information for mobile
app classification" , in Proc. 21st ACMInt. Conf. Inform. Knowl. Manage, 2012, pp. 1617-1621.

[5]

H. Zhu, E. Che, K. Yu, H. Cao, H. Xiong, J. Tian, "Mining personal context-aware preferences for
mobile users", in Proc. IEEE 12th Int. Conf. Data Mining, 2012, pp.1212-1217.

[6]

Clifton Phua1, Vincent Lee, Kate Smith, Ross Gayler, "A Comprehensive Survey of Data Miningbased Fraud Detection Research", Australian Research Council, Baycorp Advantage, and Monash
University LP0454077, 2008.

[7]

Gogu Sandeep, Sachin Malviya,Dheeraj Sapkale, Data Mining: An Improved Approach for Fraud
Detection, WSDM08, Palo Alto, California, USA. ACM 978-159593-927-9/08/0002,2008.

[8]

Geli Fei, Arjun Mukherjee,Bing Liu, Meichun Hsu, Malu Castellanos, Riddhiman Ghosh, Exploiting
Burstiness in Reviews for Review Spammer Detecction, Association for the Advancement of
Artificial Intelligence,2013.

[9]

Fangtao Li, Minlie Huang, Yi Yang, Xiaoyan Zhu, Learning to Identify Review Spam, International
Joint conference on Artificial Intelligence, 2011.

46 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 2, February - 2016. ISSN : 2348 4853, Impact Factor 1.317

[10]

Joachims, T. 1999. Making large-scale support vector machine learning practical. Advances in
Kernel Methods. (1999).

[11]

Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. KDD (2002).

[12]

Joachims, T. 1998. Text categorization with support vector machines: Learning with many
relevant features. ECML (1998).

[13]

Y. Ge, H. Xiong, C. Liu, and Z.-H. Zhou, A taxi driving fraud detection system, in Proc. IEEE 11th
Int. Conf. Data Mining, 2011,pp. 181190.

[14]

D. F. Gleich and L.-h. Lim, Rank aggregation via nuclear norm minimization, in Proc. 17th ACM
SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 6068.

[15]

T. L. Griffiths and M. Steyvers, Finding scientific topics, Proc.Nat. Acad. Sci. USA, vol. 101, pp.
52285235, 2004.

[16]

G. Heinrich, Parameter estimation for text analysis,Univ. Leipzig, Leipzig, Germany, Tech. Rep.,
http://faculty.cs.byu.edu/~ringger/CS601R/papers/Heinrich-GibbsLDA.pdf, 2008.

[17]

Fangtao Li, Minlie Huang, Yi Yang, Xiaoyan Zhu, Learning to Identify Review Spam, International
Joint conference on Artificial Intelligence, 2011.

[18]

Sihong Xiw, Guan Wang, Shuyang Lin, Philip S. Yu, Review Spam Detection via Temporal Pattern
Discovery, 2014.

[19]

N. Jindal and B. Liu, Opinion spam and analysis, in


2008, pp. 219-230.

47 | 2016, IJAFRC All Rights Reserved

Proc. Int. Conf. Web Search Data Mining,

www.ijfarc.org