Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
In the digital age, user-generated reviews play a pivotal role in shaping the success of
applications. Understanding the sentiment expressed in these reviews can provide valuable
insights for developers and stakeholders. App Reviews Sentiment Analysis means evaluating
and understanding the sentiments expressed in user reviews of mobile applications (apps). This
project aims to harness the power of sentiment analysis using Python to analyze app reviews
and uncover the sentiments conveyed by users. It involves using data analysis techniques to
determine whether the sentiments in these reviews are positive, negative, or neutral.
1.3 OBJECTIVES
The primary objective of this project is to analyze the sentiment of user reviews for
various LinkedIn applications to provide insights into user satisfaction, areas of improvement,
and overall sentiment trends.
• Gather a comprehensive dataset of user reviews for the LinkedIn app to ensure
a representative sample for analysis from various sources such as app stores
(Google play Store)
• Implement preprocessing techniques to clean and prepare the textual data,
including tokenization, removal of stop words, and handling of special
characters.
• Utilize Natural Language Processing (NLP) techniques to perform sentiment
analysis on the reviews, categorizing them as positive, negative, or neutral.
• Extract meaningful insights from the sentiment analysis results, identifying
common themes, recurring issues, and areas of user satisfaction or
dissatisfaction.
• Create visualizations, such as charts and graphs, to present the findings in an
accessible and understandable format, facilitating easier interpretation and
decision-making for stakeholders.
• Use sentiment analysis to demonstrate responsiveness to user feedback,
fostering trust and loyalty among users by showing a commitment to
addressing theirconcerns and enhancing their experience.
Purpose:
The purpose of conducting sentiment analysis on LinkedIn app reviews is
multifaceted, aiming to extract valuable insights that inform decision-making processes
anddrive continuous improvement in the mobile application. By analyzing user feedback,
LinkedIn seeks to understand the sentiments expressed by app users, ranging from
satisfaction to frustration, and identify recurring themes or issues affecting their
experience. This analysis enables LinkedIn to prioritize areas for enhancement, address
pain points, andoptimize features to better meet user needs and expectations. Moreover,
sentiment analysis helps LinkedIn track trends in user sentiment over time, assess the
effectiveness of implemented changes, and benchmark against competitors. Ultimately,
the overarching purpose of sentiment analysis on LinkedIn app reviews is to foster a user-
centric approachto app development, enhance user satisfaction, and maintain LinkedIn's
position as a leadingplatform for professional networking and career development.
CHAPTER 2
LITERATURE SURVEY
• Mahmud Isnan et al. Sentiment analysis for TikTok review using VADER
sentiment and SVM model. TikTok, a social networking site for uploading short
videos, has become one of the most popular. Despite this, not all users are happy
with the app; thereare criticisms and suggestions, one of which is reviewed via the
TikTok app on the Google Play Store. The reviews were extracted and then used
for training a sentiment analysis model.
• C.J., & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for
Sentiment Analysis of Socia Media Text. Eighth International Conference on
Weblogs and Social Media. The inherent nature of social media content poses
serious challenges to practical applications of sentiment analysis. This paper
present VADER, a simple rule-based model for general sentiment analysis, and
compare its effectiveness to eleventypical state-of-practice benchmarks including
LIWC, ANEW, the General Inquirer, Sent WordNet, and machine learning
oriented techniques relying on Naive Bayes, Maximum Entropy, and Support
Vector Machine (SVM) algorithms.
One of the major drawbacks of the existing system is the absence of automated sentiment
analysis, which would greatly expedite the review process and allow for more
comprehensive analysis of user feedback. Automated sentiment analysis tools can
efficiently categorize and analyze large volumes of data, providing valuable insights into
user sentiments in real-time. This would enable LinkedIn to identify emerging trends and
address issues promptly, ultimately enhancing user satisfaction and retention.
2.1.1 Limitations:
sentimentanalysis.
• Dependency on human resources for manual analysis.
• Inefficiency compared to automated processes, leading to delays in
addressing user concerns.
• Increased risk of errors with human involvement in sentiment analysis.
Limited integration with LinkedIn's backend, reducing the
effectiveness of
feedback in product development.
2.2.1 Advantages:
• Increased efficiency through automation of sentiment analysis processes.
• Ability to handle large volumes of user reviews, ensuring scalability.
• Accurate classification of reviews into positive, negative, or neutral
sentimentsusing advanced machine learning algorithms.
• Intuitive visualizations, such as bar charts and word clouds,
offercomprehensive insights into sentiment distributions and key feedback
themes.
• Real-time access to sentiment analysis results via a user-friendly dashboard.
• Continuous feedback mechanisms for ongoing improvement and
adaptation toevolving user preferences.
• Empowerment of decision-making processes through timely and accurate
insights into user sentiments.
2.2.2 Disadvantages:
• Building and implementing a Python-based application with advanced
machine learning models requires significant initial investment in terms of
time, resources, and expertise.
• The accuracy of sentiment analysis heavily depends on the quality and
relevance of the data collected from user reviews. Poor-quality or biased
data can lead to inaccurate analysis results.
• Machine learning models may struggle to understand nuanced context or
sarcasm present in user reviews, potentially leading to misinterpretation of
sentiments.
• Relying solely on automated sentiment analysis may overlook the
importance of human judgment and qualitative insights, potentially
neglecting crucial aspects of user feedback analysis.
• Building and implementing a Python-based application with advanced
machine learning models requires significant initial investment in terms of
time, resources, and expertise.
• The accuracy of sentiment analysis heavily depends on the quality and
relevance of the data collected from user reviews. Poor-quality or biased
data can lead to inaccurate analysis results.
• Machine learning models may struggle to understand nuanced context or
sarcasm present in user reviews, potentially leading to misinterpretation of
sentiments.
• Relying solely on automated sentiment analysis may overlook the
importance of human judgment and qualitative insights, potentially
neglecting crucial aspects of user feedback analysis.
CHAPTER 3
Pandas:
• Pandas is a widely used open-source data manipulation and analysis library
in Python. It provides high-performance data structures and tools for
working withstructured data, making it particularly useful for tasks such as
data cleaning, exploration, and analysis.
• Central to Pandas is the DataFrame, a two-dimensional labeled data
structure resembling a spreadsheet or SQL table, which allows for easy
handling and manipulation of data. With Pandas, users can load data from
various file formats such as CSV, Excel, SQL databases, and JSON, and
perform operations like indexing, slicing, filtering, grouping, and
aggregation efficiently.
• Its integration with other Python libraries such as NumPy, Matplotlib, and
Scikit- learn further enhances its capabilities in data analysis and
visualization. Pandas also offers powerful features for handling missing
data, time-series data, and categorical data, providing comprehensive
solutions for real-world dataanalysis tasks.
Matplotlib.pyplot:
• Matplotlib.pyplot is a comprehensive plotting library in Python widely used
for creating static, interactive, and publication-quality visualizations. It is a
part of the Matplotlib library, which offers a wide range of plotting functions
and options to visualize data in various formats.
• Matplotlib.pyplot provides a MATLAB-like interface for generating plots,
allowing users to create line plots, scatter plots, bar plots, histograms, and
manyother types of plots with ease. Its versatility and customization options
make it suitable for a wide range of applications, from simple exploratory
data analysisto complex data visualization tasks.
Seaborn:
• Seaborn is a Python data visualization library based on Matplotlib that
provides a high-level interface for creating informative and visually
appealing statistical graphics. It is built on top of Matplotlib and integrates
seamlessly with Pandas data structures, making it particularly well-suited
for data analysis tasks.
• Seaborn simplifies the process of creating complex visualizations by
providing a wide range of built-in themes and color palettes, as well as
functions for creating various statistical plots such as scatter plots, bar plots,
box plots, violinplots, and heatmaps.
• These plots are designed to reveal patterns, trends, and relationships within
data quickly and efficiently. Seaborn also offers tools for visualizing
categorical dataand for visualizing distributions of univariate and bivariate
data. Additionally, Seaborn provides support for complex multi-plot grids
and facilitates the creation of faceted visualizations for exploring data across
multiple dimensions.
• With its user-friendly interface, elegant default styles, and powerful
capabilities for statistical visualization, Seaborn has become a popular
choice among data scientists and analysts for creating professional-quality
visualizations for data exploration and presentation.
3.2 TECHNOLOGIES:
The Swiggy project involved data collection, cleaning, and exploration using Python
and advanced data exploration using SQL. The data collection process involved creating a
collection of tables that stored important information such as orders, food items, restaurant
menus, restaurants, and user registration information. The data was then cleaned and analyzed
to understand various aspects of the business, including popular cuisines, average price per
dish, top restaurants, and monthly sales. The SQL queries allowed for more advanced
exploration, such as identifying loyal customers, popular dishes, and revenue growth. The
project culminated in the creation of a dynamic Tableau dashboard that provided insights into
customer behaviour, top restaurants, monthly sales, and revenue growth. Overall, this project
demonstrated the importance of data-driven decision-making in the food delivery industry
and the value of data analysis in improving customer experience and increasing revenue.
Text Blob:
Text Blob is a Python library commonly used for sentiment analysis tasks, including
analyzing LinkedIn app reviews. Text Blob provides a straightforward and intuitive interface
for performing sentiment analysis, making it accessible for developers without extensive
NLPexpertise. One of the main techniques used in Text Blob for sentiment analysis is the
polarity scoring system, where each word in the text is assigned a polarity score indicating its
sentiment orientation (positive, negative, or neutral). These polarity scores are then
aggregated to determine the overall sentiment polarity of the text. Additionally, Text Blob
employs machine learning algorithms to classify text into sentimentcategories based on a pre-
trained sentiment classifier. This classifier is trained on a labeleddataset containing examples
of positive, negative, and neutral text, enabling Text Blob to accurately classify new text
inputs. Furthermore, Text Blob offers capabilities for handling various aspects of natural
language processing, such as tokenization, part-of-speech tagging, and noun phrase
extraction, which can be useful for preprocessing text data beforesentiment analysis.
NLP libraries like NLTK (Natural Language Toolkit) provide tools and resources for
processing and analyzing text data. Natural Language Processing (NLP) techniques are
essential for conducting sentiment analysis on LinkedIn app reviews. These techniques
involve a range of processes aimed at understanding and interpreting human language. One
common approach is text preprocessing, which involves tasks like tokenization, stemming,
and lemmatization to break down text into its constituent parts and standardize word forms.
The text can be classified into positive, negative, or neutral sentiments based on features
extracted from the text. Additionally, techniques such as part- of-speech tagging and named
entity recognition may be employed to identify important entities or aspects mentioned in the
reviews, providing deeper insights into specific areas of user satisfaction or dissatisfaction.
Overall, leveraging these NLP techniques enables LinkedIn app developers to gain valuable
insights from user reviews, facilitating data- driven decisions for enhancing the app's
performance and user satisfaction.
Hardware requirements:
• RAM: 4GB
Software Requirements:
• RAM: 4GB
• CPU: Intel Core i5 processor
A laptop with a minimum of 4GB RAM and an Intel Core i3 processor or higher is
generally suitable for basic computing tasks, productivity, and light multitasking. The Intel
Core i3 processor, or a higher-tier processor, provides decent performance for everyday
computing tasks. It can handle activities such as web browsing, word processing, spreadsheet
work, and multimedia consumption smoothly. With a minimum of 4GB RAM, the laptop can
efficiently run basic applications and provide a responsive computing experience. However,
for more demanding tasks or improved multitasking performance, consider opting for a laptop
with 8GB or more RAM. Laptops with these specifications often come with a variety of
operating systems, including Windows, macOS, or Linux. The choice of the operating system
depends on personal preferences and specific application requirements. Storage capacity may
vary, but laptops in this category typically come with at least 128GB to 256GB of solid-state
drive (SSD) storage. SSDs contribute to faster boot times and improved overall system
responsiveness. These laptops typically come with a variety of connectivity options, including
USB ports, HDMI, audio jacks, and wireless connectivity (Wi-Fi and Bluetooth). Some models
may also include additional features such as USB-C ports.
Python:
Features:
etc).
• Python has syntax that allows developers to write programs with fewer lines
Google collab:
CHAPTER 4
SYSTEM DESIGN
• Input Data:
Providing data for app review sentiment analysis on the Play Store involves users
submitting their feedback and ratings regarding their experiences with the application.
Users typically access the Play Store through the Google Play app on their Android
devices or via the web interface. Once on the app's page, users can navigate to the
review section where they have the option to rate the app on a scale from one to five
stars and provide written feedback. This feedback can include comments on various
aspects of the app such as usability, performance, features, bugs, and overall
satisfaction. Users may also express their likes, dislikes, suggestions for improvements,
or report any issues they encounter.
• Data Processing:
• Feature Extraction:
Term Frequency-Inverse Document Frequency weighs the importance of each
word based on its frequency in the review and across the entire corpus of reviews, thus
prioritizing words that are more informative for sentiment analysis. Word embeddings,
on the other hand, represent words in a continuous vector space, capturing semantic
relationships between words. These numerical feature representations encode the
semantic meaning of the reviews, allowing machine learning algorithms to learn
patterns and relationships between words and sentiments. By performing feature
extraction, the LinkedIn app review data is transformed into a format suitable for
training sentiment analysis models, enabling accurate prediction of the sentiment
expressed in the reviews.
users. NLP enables the analysis of textual data to discern whether the sentiment
conveyed in the reviews is positive, negative, or neutral. Techniques such as
tokenization, part-of-speech tagging, and syntactic parsing are applied to break down
the text into smaller units, identify the grammatical structure, and extract relevant
features. By leveraging NLP techniques and machine learning models, sentiment
analysis provides valuable insights into user perceptions and sentiments towards the
LinkedIn app, enabling developers and stakeholders to make data-driven decisions for
enhancing user experience and satisfaction.
CHAPTER 5
IMPLEMENTATION
5.1 CODING
# import python libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # visualizing data
%matplotlib inline
import seaborn as sns
# import csv file
df = pd.read_csv(r'C:\Users\RAJU\Desktop\MY Recent files\Python_Diwali_Sales_Analysis-
main\Diwali Sales Data.csv', encoding= 'unicode_escape')
# to avoid encoding error, use 'unicode_escape'
df.shape
df.head(4)
df.info()
#drop unrelated/blank columns
df.drop(['Status', 'unnamed1'], axis=1, inplace=True)
#check for null values
pd.isnull(df).sum()
# drop null values
df.dropna(inplace=True)
# change data type
df['Amount'] = df['Amount'].astype('int')
# to check the datatype
df['Amount'].dtypes
df.columns
#rename column
df.rename(columns= {'Marital_Status':'Shaadi'})
# describe() method returns description of the data in the DataFrame (i.e. count, mean, std,
etc)
df.describe()
sales_gen = df.groupby(['Gender'],
as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)
sns.set(rc={'figure.figsize':(15,5)})
sns.barplot(data = sales_state, x = 'State',y= 'Orders')
# total amount/sales from top 10 states
sales_state = df.groupby(['State'],
as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10)
sns.set(rc={'figure.figsize':(15,5)})
sns.barplot(data = sales_state, x = 'State',y= 'Amount')
ax = sns.countplot(data = df, x = 'Marital_Status')
sns.set(rc={'figure.figsize':(7,5)})
for bars in ax.containers:
ax.bar_label(bars)
sales_state = df.groupby(['Marital_Status', 'Gender'],
as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)
sns.set(rc={'figure.figsize':(6,5)})
sns.barplot(data = sales_state, x = 'Marital_Status',y= 'Amount', hue='Gender')
sns.set(rc={'figure.figsize':(20,5)})
ax = sns.countplot(data = df, x = 'Occupation')
sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Occupation',y= 'Amount')
sns.set(rc={'figure.figsize':(21,5)})
ax = sns.countplot(data = df, x = 'Product_Category')
sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Product_Category',y= 'Amount')
sales_state = df.groupby(['Product_ID'],
as_index=False)['Orders'].sum().sort_values(by='Orders', ascending=False).head(10)
sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Product_ID',y= 'Orders')
# top 10 most sold products (same thing as above)
5.2 SNAPSHOTS:
Fig 5.2.1: plotting a bar chart for Gender and it's count.
From above graphs we can see that most of the buyers are females and even the
purchasing power of females are greater than men
CHAPTER 6
SOFTWARE TESTING
The procedure of executing system with the target of finding error is outlined as testing. It can
also be defined as the process that defines, isolates, subjects to rectification of defects, and so
that the customer satisfaction is reached at last with the assurance of the system is free from
defects. Software testing is a very important element of the quality assurance, and it represents
the SRS, designing, coding and implementation of the system proposed.
Test Planning:
Test plan is the document that gives the information regarding the procedure that is to be
followed in performing various tasting on the whole application. This document involves scope
and objectives of the testing, areas that are to be tested and areas that should not be tested,
scheduling of resources available, the area that need to be automated and various tools that are
used for testing.
Test Development:
Test development involves development of test cases and their procedural preparation i.e.
description of the developed test cases.
Various types of testing that are done on the system are as follows:
• Unit testing
• Integration testing
• System testing
• Unit Testing:
As the name itself says, this type of testing is done on small units of the system. A part
of the system is considered as a unit and its testing is done. If as an example, login page
considered; the user or the administrator can enter into their respective home pages only
after giving the valid username and password. This part of validating a system, by
considering Login as a unit can be said as a unit testing.
• Integration Testing:
This part of testing deals with the testing procedure. It involves, testing of various
integrations of several units. It checks whether the system is functioning correctly when
two or more units are integrated together. This part of testing gives information about
order of arrangements of various units, integrating modules, systems, sub-systems and
the entire system as a whole.
• System Testing:
This testing technique deals with the process of testing the system as a whole. At the
end of each project, all defects are removed and the interface errors are uncovered in
order to achieve the good functioning of the whole system. This testing technique can
be called as the final part of whole testing process.
Test Case 1
Result Successful
Test Case 1
Result Successful
CHAPTER 7
CONCLUSION
Integrating advanced sentiment analysis capabilities into the review system of the LinkedIn
app holds tremendous potential for enhancing user experience and driving continuous
improvement. Ultimately, leveraging sentiment analysis not only allows LinkedIn to better
understand and respond to user feedback but also empowers the platform to deliver more
personalized and satisfying experiences to its diverse user base, solidifying its position as a
leading professional networking platform in the digital landscape. Swiggy sales data reviews
using Python presents a comprehensive approach to understanding user opinions and
experiences with the app. By deciphering the sentiment polarity of reviews, whether positive,
negative, or neutral, this analysis provides developers and marketers with actionable
information to enhance the app's user experience. On the other hand, there exists a cohort of
users who express dissatisfaction with certain aspects of the LinkedIn app. Common grievances
include occasional technical glitches, such as slow loading times or crashes, which can disrupt
the user experience. Some users also report limitations in messaging functionality, finding it
cumbersome or inadequate for effective communication with connections. Furthermore, there
are those who perceive a lack of significant innovation in the app's development, noting a
stagnation in features or improvements compared to other social networking platforms.
CHAPTER 8
FUTURE ENHANCEMENT
In the future, several enhancements can be integrated into the LinkedIn app review sentiment
analysis project to further refine its effectiveness and applicability.
• Instead of just categorizing reviews into positive, negative, or neutral, a more nuanced
approach could be adopted. This might involve identifying specific emotions like joy,
frustration, satisfaction, etc., allowing for a more detailed understanding of user
feedback.
• Analyzing different aspects of the LinkedIn app separately, such as user interface,
functionality, customer support, etc., can provide more targeted insights.
• LinkedIn has a global user base, so supporting sentiment analysis for reviews in
multiple languages would be beneficial. This would involve training models on data
in various languages and ensuring accurate sentiment analysis across different
linguistic contexts.
• Integrate sentiment analysis directly into LinkedIn's feedback loop, enabling faster
response to user concerns and facilitating continuous improvement of the app based
on real-time insights from user feedback.
BIBLIOGRAPHIY
1. S Kurniawan et al. Text Mining Pre-Processing Using Data Framework and RapidMiner for
Indonesian Sentiment Analysis.
2. Mahmud Isnan et al. Sentiment analysis for TikTok review using VADER sentiment and
SVM model.
3. Hutto, C.J., & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for
Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and
Social Media.
5. Al Amrani Y, Lazaar M, El Kadiri KE (2018) Random forest and support vector machine
based hybrid approach to sentiment analysis. Procedia Compute Sci 127:511– 520