You are on page 1of 31

A

/Project Report
On

Amazon Review(Sentiment Analysis)


Submitted in partial fulfillment of the requirement for the IV semester
Bachelor of Computer Science
By

Harshit Joshi
Vineet Goswami
Under the Guidance of
Mr. Ravindra Koranga
Assistant Professor
Department of CSE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


GRAPHIC ERA HILL UNIVERSITY, BHIMTAL CAMPUS
SATTAL ROAD, P.O. BHOWALI,
DISTRICT- NAINITAL-263132
2022- 2023
STUDENT’S DECLARATION

We, Harshit Joshi and Vineet Goswami here by declare the work, which is being presented in

the project, entitled “Amazon Review(Sentiment Analysis)” in partial fulfillment of the

requirement for the award of the degree B.Tech in the session 2022-2023, is an authentic record

of my own work carried out under the supervision of “Mr. Ravindra Koranga”, Assistant

Professor, Department of CSE, Graphic Era Hill University, Bhimtal.

The matter embodied in this project has not been submitted by us for the award of any other

degree.

Date:

Harshit Joshi

Vineet Goswami
CERTIFICATE

The project report entitled “Amazon Review(Sentiment Analysis) application” being

submitted by Harshit Joshi and Vineet Goswami to Graphic Era Hill University Bhimtal

Campus for the award of Bonafede work carried out by them. They have worked under my

guidance and supervision and fulfilled the requirement for the submission of report.

(Mr. Ravindra Koranga) (Dr. Ankur Bisht)

Project Guide (HOD, CSE Dept.)


ACKNOWLEDGEMENT

We take immense pleasure in thanking Honorable “Mr. Ravindra Koranga”

(AssistantProfessor,CSE, GEHU Bhimtal Campus) to permit me and carry out this project

work with his excellent and optimistic supervision. This has all been possible due to his novel

inspiration, able guidance and useful suggestions that helped me to develop as a creative

researcher and complete the research work, in time.

Words are inadequate in offering my thanks to GOD for providing me everything that we

need. We again want to extend thanks to our President “Prof. (Dr.) Kamal Ghanshala” for

providing us all infrastructure and facilities to work in need without which this work could not be

possible.

Many thanks to Professor “Dr. Manoj Chandra Lohani” (Director Gehu Bhimtal),

other faculties for their insightful comments, constructive suggestions, valuable advice, and time

in reviewing this thesis.

Finally, yet importantly, we would like to express my heartiest thanks to our beloved parents,

for their moral support, affection and blessings. We would also like to pay our sincere thanks to

all our friends and well-wishers for their help and wishes for the successful completion of this

research.

Harshit Joshi

Vineet Goswami
TABLE OF CONTENTS

Declaration…………………………………………………………………………..I

Certificate……………………………………………………………………………II

Acknowledgement…………………………………………………………………..III

Abstract………………………………………………………………………………IV

Table of Contents…………………………………………………………………….

List of Publications…………………………………………………………………..

List of Tables…………………………………………………………………………

List of Figures………………………………………………………………………..

List of Symbols……………………………………………………………………….

List of Abbreviations………………………………………………………………...

CHAPTER 1: INTRODUCTION……………………………………………

1.1 Objective………………………………………………………

1.2 Background and Motivations………………………………….

1.3 Problem Statement…………………………………………….

1.4 Objectives and Research Methodology……………………….

1.5 Project Organization…………………………………………..

CHAPTER 2: PROPOSED SYSTEM………………………………………

2.1 History………………………………………………………...

2.2 ………
CHAPTER 3: S/W AND H/W REQUIREMENTS (UP TO FULLEST EXTENT)

……………………………………………………………………

3.1 S/W and H/W requirements (up to fullest extent)……………………

3.2 Resources and Technology used……………………………………..

CHAPTER 4: CODING OF FUNCTION…………………………………………..

4.1 Basic modules of the project…………………………………………..

CHAPTER 5 : LIMITATIONS (WITH PROJECT)

CHAPTER 6 : CONCLUSION

REFERENCES………………………………………………
PROJECT ABSTRACT

The Amazon Review (Sentiment Analysis) Application is a machine learning project aimed at

developing a robust and accurate model for sentiment classification of product reviews.

Sentiment analysis, a subfield of Natural Language Processing (NLP), plays a significant role in

understanding public opinion and has various applications in domains such as marketing,

recommendation systems, and social media analysis.

For this project, a amazon jewelry review dataset was extracted from Kaggle.com, comprising

product reviews and their corresponding Star rating labels. The dataset was preprocessed using

NLP techniques to remove noise, standardize the text, and extract relevant features. Machine

learning algorithms, including Naive Bayes, Support Vector Machines, and Neural Networks,

were implemented and trained on the labeled dataset. The model's performance was evaluated

using metrics such as accuracy, precision and confusion matrix.

The Amazon Review (Sentiment Analysis) Application provides a user-friendly interface where

users can input product review, and the model predicts the sentiment as positive or negative. The

application offers valuable insights into public sentiment regarding product and can assist

stakeholders in the business industry in decision-making processes related to marketing

strategies, Jewelry recommendations, and customer targeting.

The project contributes to the field of sentiment analysis by demonstrating the effectiveness of

machine learning techniques in accurately classifying sentiment in product reviews. The results

showcase the application's potential in analyzing and understanding public opinion and its

relevance in the business industry. Future improvements and extensions can be explored to
enhance the model's accuracy and efficiency in sentiment classification and to address more

specific sentiment categories or domain-specific challenge

INTRODUCTION

The Amazon Review (Sentiment Analysis) Application is a machine learning project that aims to

tackle the task of sentiment classification in product reviews. With the exponential growth of

user-generated content and the increasing popularity of online platforms for product quality

discussions, understanding the sentiment expressed in these reviews has become crucial for

various stakeholders in the business industry. Sentiment analysis, a subfield of Natural Language

Processing (NLP), offers valuable insights into public opinion and can assist in decision-making

processes, such as marketing strategies, and audience targeting.

The objective of this project is to develop a robust and accurate sentiment analysis model

specifically tailored to product reviews. By automatically classifying the sentiment expressed in

a given review as positive or negative, the application can help business man, production

companies, critics, and customers gauge the overall reception of a product. Additionally, it

provides a means to analyze the factors influencing positive or negative sentiments, enabling

better understanding of audience preferences and informing future jewelry production and

marketing endeavors.

To accomplish this, we collected a comprehensive dataset of amazon jewelry reviews from

Kaggle.com, a popular platform for accessing and sharing datasets. This dataset serves as the

foundation for training and evaluating our machine learning model. Leveraging NLP techniques,
we preprocess the raw text data, removing noise and standardizing the text for further analysis.

Features are then extracted from the preprocessed text using techniques such as bag-of-words,

TF-IDF, or word embeddings. These features serve as the input for our machine learning model.

Various machine learning algorithms, including Naive Bayes, Support Vector Machines, and

Neural Networks, are implemented and trained on the labeled dataset. Through cross-validation

and hyperparameter tuning techniques, we optimize the model's performance, aiming to achieve

high accuracy and robustness in sentiment classification. The model is then evaluated using

appropriate evaluation metrics to assess its effectiveness in distinguishing between positive and

negative sentiments in product reviews.

By successfully developing the product Review (Sentiment Analysis) Application, we can

provide a powerful tool for stakeholders in the business industry to gain valuable insights into

the reception and sentiment surrounding their product. This project contributes to the field of

sentiment analysis and highlights the practical applications of NLP and machine learning in

understanding public opinion. The subsequent sections of this report delve into the details of the

project methodology, results, analysis, and discussions, providing a comprehensive overview of

the Product Review (Sentiment Analysis) Application and its potential implications.
OBJECTIVE

The objectives of the Product Review (Sentiment Analysis) Application project are as
follows:

1. Develop a sentiment analysis model: Build a robust and accurate machine learning model
capable of classifying the sentiment of amazon reviews as positive or negative. The model
should be trained on a labeled dataset and optimized to achieve high accuracy and
performance.
2. Preprocess product review data: Implement data preprocessing techniques to clean and
normalize the raw product review text. This involves removing noise, such as HTML tags,
punctuation, and stop words, and applying text normalization techniques like tokenization
and stemming.
3. Extract relevant features: Employ feature extraction techniques, such as bag-of-words,
TF-IDF, or word embeddings, to extract meaningful features from the preprocessed produvt
review text. These features will serve as inputs to the sentiment analysis model.
4. Train and evaluate machine learning algorithms: Implement and train various machine
learning algorithms, such as Naive Bayes, Logistic Regression , Natural Language
Processing , using the labeled dataset. Evaluate the performance of the models using
appropriate evaluation metrics and select the best-performing algorithm for sentiment
classification.
5. Create an interactive application: Develop a user-friendly interface that allows users to
input product reviews and obtain predicted sentiment labels. The application should provide
a seamless and intuitive experience for users to interact with the sentiment analysis model.

6. Analyze and interpret results: Perform a comprehensive analysis of the model's


performance, including accuracy, precision, recall, and F1-score. Identify patterns,
challenges, and limitations encountered during the project and provide insights into the
strengths and weaknesses of the implemented approach.

7. Discuss potential applications and future improvements: Explore the potential


applications of the amazon Review (Sentiment Analysis) Application in the business
industry and related domains. Discuss possible future enhancements, such as incorporating
domain-specific features, leveraging deep learning techniques, or expanding the application
to handle additional sentiment categories.

By achieving these objectives, the project aims to contribute to the field of sentiment
analysis, provide valuable insights into product reviews, and offer a practical tool for
analyzing and understanding public sentiment in the context of jewelry
PROBLEM STATEMENT

The Amazon Review (Sentiment Analysis) Application project addresses the following
problem statement:

Developing an accurate and efficient sentiment analysis model for product reviews that can
classify the sentiment expressed in a given review as positive or negative. The objective is
to provide a reliable tool for stakeholders in the Business industry to gauge the overall
reception and sentiment surrounding their products, enabling them to make informed
decisions regarding marketing strategies, and audience targeting.

Project Organization
We are 2 members in this project and contributing to this project as

 Harshit Joshi
 Vineet Goswami

 Harshit Joshi is working Training and Dumping model.


 Vineet Goswami is working on the front end & testing of the project.

Present Status of development of project

 Our all paperwork is completed.


 We completed the planning phase of application.
 We completed the model training.
 We completed the front End .
 We are in project deployement phrase.
Resources and Technology used

Model training:

The Amazon Review (Sentiment Analysis) Application project utilized a variety of resources and
technologies to develop and implement the sentiment analysis model. The following are the key
resources and technologies employed throughout the project:

1. Dataset:
-The amazon review dataset was obtained from Kaggle.com, a popular platform for accessing
and sharing datasets. It consists of product reviews along with their corresponding Star ratings.

2. Programming Language:
- Python was used as the primary programming language for developing the amazon Review
(Sentiment Analysis) Application. Python offers a rich ecosystem of libraries and frameworks for
natural language processing and machine learning tasks.

3. Libraries/Frameworks:
- Natural Language Processing Toolkit (NLTK): NLTK is a powerful library in Python used
for various NLP tasks, including text preprocessing, tokenization, stemming, and sentiment
analysis.

- Scikit-Learn: scikit-learn is a widely-used machine learning library that provides efficient


implementations of various classification algorithms, evaluation metrics, and tools for model
training and evaluation.

4. Text Preprocessing Techniques:


- Text preprocessing techniques were employed to clean and normalize the product review text.
These techniques include expand contractions, removing HTML tags, punctuation, and stop
words, as well as performing text normalization processes such as tokenization and stemming.

5. Feature Extraction Techniques:


- Various feature extraction techniques were employed to represent the preprocessed text data as
numerical features that can be used as input to the machine learning model. These techniques
include bag-of-words, TF-IDF (Term Frequency-Inverse Document Frequency), and word
embeddings.

6. Machine Learning Algorithms:


- Several machine learning algorithms were implemented and evaluated for sentiment
classification. These include Naive Bayes, Support Vector Machine, Decision Tree, Random
Forest Logistic Regression , and Natural Language Processing . The choice of algorithm(s) was
based on their performance and suitability for the given task.

7. Development Environment:
- Integrated Development Environment (IDE) such as PyCharm, Jupyter Notebook, was used to
write and execute the Python code for the amazon Review (Sentiment Analysis) Application.

Front-end Development with Streamlit:

Streamlit, a Python library, was utilized for the front-end development of the amazon Review
(Sentiment Analysis) Application. Streamlit simplifies the process of creating interactive and
user-friendly web applications directly from Python scripts. With Streamlit, developers can
quickly build and deploy data-driven applications without extensive web development
experience.

Heading: Front-end Development with Streamlit

Streamlit offers the following advantages for front-end development in the Amazon Review
(Sentiment Analysis) Application:

1. Interactive User Interface:


- Streamlit allows for the creation of an intuitive and interactive user interface for the
application. Developers can incorporate widgets, such as sliders, dropdowns, checkboxes, and
buttons, to provide user input options and enhance the user experience.

2. Rapid Prototyping:
- Streamlit's simplicity and ease of use enable rapid prototyping and iteration. Developers can
quickly visualize and test different components and functionalities of the application, making it
efficient to refine the user interface based on requirements and feedback.

3. Seamless Integration with Python Backend:


- Streamlit seamlessly integrates with the existing Python backend of the Amazon Review
(Sentiment Analysis) Application. It enables easy communication between the front-end
interface and the machine learning model or data processing modules implemented in Python.

4. Real-time Updates:
- Streamlit provides the capability to update the application's interface in real-time as users
interact with it. This allows for dynamic visualization of results, enabling users to see immediate
feedback as they input product reviews or change settings.

5. Deployment and Sharing:


- Streamlit simplifies the process of deploying the application to the web or sharing it with
others. Developers can easily deploy the application on platforms like Heroku or share it as a
standalone web application that can be accessed by users without requiring them to install any
additional software.

By utilizing Streamlit for front-end development, the Amazon Review (Sentiment Analysis)
Application benefits from an interactive user interface, rapid prototyping capabilities, seamless
integration with the Python backend, real-time updates, and simplified deployment and sharing
option
LIMITATION
Despite the successful development of the Amazon Review (Sentiment Analysis) Application, it
is important to acknowledge the limitations encountered during the project. These limitations
may impact the application's performance, generalizability, and usability. The following
limitations should be considered:

1. Dataset Bias:
- The amazon review dataset used for training the sentiment analysis model may contain
inherent biases or limitations. The dataset's representativeness, diversity, and size could affect
the model's ability to generalize to a broader range of product reviews. Careful consideration
should be given to ensure the dataset adequately captures various jewelry genres, languages, and
cultural contexts.

2. Subjectivity and Context:


- Sentiment analysis is a challenging task due to the subjectivity and context-dependency of
human sentiment. The model's performance may vary when applied to product reviews that
involve complex emotions, sarcasm, irony, or subtle nuances. Contextual understanding,
including jewelry-specific references or cultural references, may pose challenges for the model's
accurate sentiment classification.

3. Model Overfitting or Underfitting:


- The sentiment analysis model's performance heavily relies on the quality and size of the
training dataset. Overfitting or underfitting of the model may occur if the dataset is insufficient
or unbalanced. Addressing these issues may require collecting additional labeled data or
employing advanced techniques such as data augmentation or transfer learning.

4. Limited Sentiment Categories:


- The current implementation of the sentiment analysis model in the Amazon Review
(Sentiment Analysis) Application focuses on binary sentiment classification (positive/negative).
However, real-world product reviews may contain more nuanced sentiments, such as neutral,
mixed, or specific emotional expressions. Expanding the sentiment categories could enhance the
application's versatility and accuracy in capturing a wider range of sentiments.

5. Language Limitations:
- The current implementation of the sentiment analysis model assumes the product reviews are
in a specific language. However, the model's performance may vary when applied to reviews in
different languages or when faced with code-switching or mixed-language texts. Language-
specific preprocessing and feature extraction techniques may be required to address this
limitation.

6. Scalability and Real-time Updates:


- The Amazon Review (Sentiment Analysis) Application's scalability and real-time update
capabilities should be considered. As the number of customers and product reviews increases,
the application's performance and response time may be affected. Techniques such as efficient
indexing, caching, and distributed processing could be explored to overcome scalability
limitations.

7. User Interface Design:


- While Streamlit provides a user-friendly interface, the current implementation of the front-
end design may have limitations in terms of aesthetics, customization, and responsiveness.
Additional effort may be required to improve the user interface's visual appeal and ensure
optimal user experience across different devices and screen sizes.

Understanding and addressing these limitations will contribute to the ongoing improvement and
development of the Amazon Review (Sentiment Analysis) Application, ensuring its reliability,
accuracy, and usability in practical scenarios.
IMPLEMENTATION

During the implementation process, developers must write


enough comments inside the code so that if anybody starts
working on the code later, he/she can understand what has
already been written. Writing good comments is very important
as all other documents, no matter how good they are, will be lost
eventually. Ten years after the initial work, you may find only
that information that is present inside the code in the form of
comments. Development tools also play an important role in this
phase of the project. Good development tools save a lot of time
for the developers, as well as saving money in terms of
improved productivity. The most important development tools
for time-saving are editors and debuggers. A good editor helps a
developer to write code quickly. A good debugger helps make
the written code operational in a short period. Before starting the
coding process, you should spend some time choosing good
development tools.
JUPYTER CODE:
STREAMLIT CODE(Frontend)
CONCLUSION
The Amazon Review (Sentiment Analysis) Application project aimed to develop a robust and
accurate sentiment analysis model for product reviews, providing valuable insights into public
opinion and assisting stakeholders in the Business industry with decision-making processes.
Throughout the project, various resources and technologies were utilized, including a amazon
review dataset from Kaggle, Python programming language, NLP libraries such as NLTK and
scikit-learn, text preprocessing techniques, feature extraction methods, and machine learning
algorithms.

The project successfully achieved its objectives by implementing a sentiment analysis model
trained on the labeled product review dataset. The model demonstrated commendable
performance in accurately classifying the sentiment of product reviews as positive or negative.
Using appropriate evaluation metrics, such as accuracy, precision, recall, and F1-score, the
effectiveness and reliability of the model were assessed.

The Amazon Review (Sentiment Analysis) Application holds significant potential for various
applications in the Business industry. Product based companies, and marketers can leverage the
application to gain valuable insights into audience sentiment and reception of their products. It
enables them to make data-driven decisions regarding marketing strategies,product
recommendations, and customer targeting, ultimately contributing to better audience engagement
and satisfaction.

The project also highlighted the importance of natural language processing and machine learning
techniques in sentiment analysis tasks. By employing techniques such as text preprocessing,
feature extraction, and machine learning algorithms, the model successfully processed and
analyzed large volumes of product review data, providing accurate sentiment predictions.

Throughout the development process, Streamlit was utilized for front-end development, enabling
the creation of an interactive user interface. Streamlit simplified the development and
deployment of the application, facilitating seamless communication between the front-end and
back-end components.

In conclusion, the Amazon Review (Sentiment Analysis) Application project has successfully
addressed the problem of sentiment classification in product reviews. It has demonstrated the
potential for utilizing machine learning and NLP techniques to analyze and understand public
opinion in the business industry. The project's outcomes contribute to the field of sentiment
analysis and offer practical applications for stakeholders in the business industry. Moving
forward, further improvements and enhancements can be explored to expand the application's
capabilities and address additional sentiment categories or specific domains within the industry
REFERENCES

[1] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends
in Information Retrieval, 2(1-2), 1-135.
[2] Kaggle: Amazon Review Dataset. Retrieved from https://www.kaggle.com/

[3] Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly
Media.

[4] Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine
Learning Research, 12, 2825-2830.

[5] Chen, M., & Liu, Y. (2017). Sentiment Analysis and Opinion Mining. Morgan & Claypool
Publishers.

[6] Streamlit: The fastest way to build custom ML tools. Retrieved from
https://www.streamlit.io/

[7] Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow.
O'Reilly Media.

[8] OpenAI. (2021). GPT-3.5. Retrieved from https://platform.openai.com/docs/guides/chat.

[9] McKinney, W. (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and
IPython. O'Reilly Media.

Note: The above references provide additional information and resources on sentiment analysis,
NLP, machine learning, dataset sources, and tools used in the development of the Amazon
Review (Sentiment Analysis) Application.

You might also like