You are on page 1of 2

Accenture Case Study - Analysis of Google Play App

Team Name: JARVIS

Team Members: P. Vijaya Prakash


R.D. Pravin Kumar
A. Ganesh Shyam

Institute: IIT MADRAS

Objective:

To analyse the data and identify the apps that have the potential to grow and those which will fade away in
the future.

Data Given to us:

 googleplaystore.csv file which includes variables app name, size, no. of downloads, user rating, app
classification, paid or free etc. of 10841 apps.
 googleplaystore_user_reviews.csv file which includes variables - app name, reviews in the plain text,
Sentiment, Sentiment Polarity, and Sentiment Subjectivity of 1074 apps

Methodology

 For an application to be well liked by the customers it should incorporate the feedbacks from the customer
through updates.
 Based on from research we found that a successful app releases at least one update every month.
 For ease of analysis we remove all special characters from the columns App & Installs in both the data. Then
we change all app names to lower case in both the data to perform a join in the future.
 For this analysis we have considered the apps that have been recently updated within the last two months
(June, 2018 and July, 2018) to have a higher potential for growth. So we segregate this data from
googleplaystore.csv as a new table T1 for further analysis.
 The remaining apps can be considered to fade in future.
 Some data points in T1 have missing values in the rating column so we plugged in the missing values by taking
the average of other app ratings from similar category and also similar no. of downloads.
 In the googleplaystore_user_reviews.csv there are some data points which don’t have any reviews in them
making the data point redundant so we remove such data points creating a new table T2.
 In T2 we find the average of Sentiment_Polarity and Sentiment_Subjectivity for each App.
 Now T1 and T2 are joined using App name resulting in another table T3 containing all the variables we need
for the analysis.
 Some apps may have low Installs but higher rating so we cannot use these variables independently to further
classify the available data.
 In order to address this issue we multiply the columns Rating and Installs and create a new column RI_Factor.
 Sentiment_Subjectivity quantifies the public opinion of the individual app review. As we are trying to infer the
given data based on the opinion of the users we must factor in Sentiment_Polarity along with
Sentiment_Subjectivity.
 Hence in T3 table we multiply the columns Sentiment_Polarity and Sentiment_Subjectivity and create a new
column called Sentiment_Factor.
 The apps are sorted based on the Sentiment_Factor in descending order and the apps that have negative
Sentiment_Factor can be considered to fade in future.
 The apps which have positive sentiment factor is further sorted based on the RI_ Factor.
 The data we have now factors in Rating, Installs, Sentiment_Polarity and Sentiment_Subjectivity.
 We can infer that these apps will have good potential growth in future in the same order that we got post
sorting by RI_Factor.
 Some apps that were updated in the last two months did not have reviews hence we cannot calculate the
Sentiment_Factor for them. So we calculate the RI_Factor here and sort by RI_Factor.
 We can say that these apps also have the potential to grow in future in the order that we got post sorting by
RI_Factor but we cannot say these with considerable amount of certainty as we don’t have the
Sentiment_Factor.
 We could further analyze this data if we have the time of installation for every app.
 We can also find the apps that may fail in future by analyzing the reviews. For example if the words “good
once” occurs together then we can that the app is not as good as it used to be so it may fail in future.
 “good once” is just an example we can also use combination of other words to further refine the result.

You might also like