Professional Documents
Culture Documents
WAH CAMPUS
(Project Report)
Submitted by:
Mian M. Shoaib Mehboob (FA18-BSE-7B-008)
M. Kawish Feroz (FA18-BSE-7B-073)
Fahad Ali (FA18-BSE-7B-076)
Submitted To:
Dr. Hikmat Ullah Khan
Submission date:
20th December, 2021
Project Title:
Sentiment Analysis of Amazon Product Reviews using Python
1. Introduction + Aim:
In today’s world sentiment analysis can play a vital role in any industry. Classifying tweets,
Facebook comments or product reviews using a system can save a lot of time and money. At
the same time, the probability of error is lower.
Our aim is to perform Sentiment Analysis on Amazon Products Reviews. We will scrap data of
two selected products and then scrap reviews of these products. We have used python and a
few libraries of python to perform scrapping and applying algorithms for sentiment analysis.
2. Selected Algorithm:
1. We have used Supervised Algorithm for Classification. In Supervised Algorithm for
Classification we will use Naive Bayes classifier Algorithm.
Classification:
Classification is the process of predicting the class of given data points. Classes are sometimes
called as targets/ labels or categories. Classification predictive modeling is the task of
approximating a mapping function (f) from input variables (X) to discrete output variables (y).
Naïve Bayes
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It
is not a single algorithm but a family of algorithms where all of them share a common principle,
i.e. every pair of features being classified is independent of each other.
Advantages:
It is simple and easy to implement
It doesn’t require as much training data
It handles both continuous and discrete data
It is highly scalable with the number of predictors and data points
It is fast and can be used to make real-time predictions
It is not sensitive to irrelevant features
Disadvantages:
Naive Bayes assumes that all predictors (or features) are independent, rarely happening in
real life. This limits the applicability of this algorithm in real-world use cases.
This algorithm faces the ‘zero-frequency problem’ where it assigns zero probability to a
categorical variable whose category in the test data set wasn’t available in the training
dataset. It would be best if you used a smoothing technique to overcome this issue.
Its estimations can be wrong in some cases, so you shouldn’t take its probability outputs very
seriously.
3. Dataset:
We scraped reviews of 2 products from Amazon Website using python.
Data Source:
Product 1 link:
https://www.amazon.in/OnePlus-Mirror-Black-128GB-Storage/product-
reviews/B07DJHV6VZ/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews%
27
Product 2 link:
https://www.amazon.com/adidas-Mens-Questar-White-
Black/dp/B08779MB6H/ref=sr_1_42?keywords=shoes&qid=1639316391&s=fashion-mens-
intl-ship&sr=1-42&th=1
Statistic Table:
Averaged Values:
Final Result:
7. References:
https://www.kaggle.com/holfyuen/tutorial-scatter-plots-in-python
https://www.youtube.com/watch?v=AnvrJNLKp0k&t=659s
https://www.youtube.com/watch?v=O_B7XLfx0ic
https://theappsolutions.com/blog/development/sentiment-analysis/
https://www.geeksforgeeks.org/naive-bayes-classifiers/
8. Appendix:
Scrapping using Library BeautifulSoup:
Import Link and Check Response
Data Cleaning
Scrap Reviews
Store Data into a Table
Convert File into Excel (CSV) and save file into Device