You are on page 1of 8

DECLARATION

I hereby declare that this report submission is my own work and that, to
the best of my knowledge and belief, it contains no material previously published
or written by another person nor material which has been accepted for the
award of any other degree or diploma of the university or other institute of higher
learning, except where due acknowledgment has been made in the text.

Place:
Date

Signature of the candidate


Name:

Reg. No. 2014013744


Roll. No. 140132018

CERTIFICATE

Certified that this thesis entitled Sentiment Analysis Using Hybrid Cluster
and Predict Model is the bonafide work of Mr. KAMAL SINGH who carried out
project work under my supervision. Certified further, that to the best of my
knowledge the work reported herein does not form part of any other project
report or dissertation on the basis of which a degree or award was conferred on
an earlier occasion on this or any other candidate.

Signature of Supervisor
Mr. Mukul Varshney

The M.Tech. Viva-Voce Examination of Mr./Ms, has been


held on................................

Signature of External Examiner

Head of the Department/Program Coordinator

ii

ABSTRACT

Over the past decade humans have experienced exponential growth in the use
of online resources, in particular social media and microblogging websites such
as Facebook, Twitter, YouTube and also mobile applications such as WhatsApp,
Line, etc. Many companies have identified these resources as a rich mine of
marketing knowledge. This knowledge provides valuable feedback which allows
them to further develop the next generation of their product. In this report
sentiment analysis about apple product have been performed by extracting
tweets about that product and classifying the tweets showing it as positive and
negative feedback for apple product. We propose a hybrid approach which uses
k medoid clustering to form the clusters and uses a supervised learning
technique known as CART method to make the predictions on those clusters.

iii

ACKNOWLEDGEMENT
I take this opportunity to acknowledge to Mr Mukul Varshney, my project guide
whose valuable inputs helped us to complete this report.
With profound sense of gratitude and sincere thanks to Prof. Ishan Ranjan
(Head of the Department), Department of Computer Science and Engineering,
Sharda University, Greater Noida, U.P., INDIA. It was very inspiring and
knowledgeable for me to work with enlightened and disciplined personality.
I also want to express sincere thanks to Dr. Manoj Kumar Gupta (Program
Coordinator) for his continuing sincere helps and supports to complete this
report. Last but not the least, I wish to thank my friends for their continuous
support.

KAMAL SINGH

iv

LIST OF TABLES
Table 2.1 Performance of lexical approach variants

16

Table 2.2 Performance machine learning approach variants

17

Table 2.3

19

Summary of literature Survey

Table 4.4 Comparison of Various Classification Algorithms

34

LIST OF FIGURES

2.1 Generic architecture of an lexical approach classifier

12

2.2 Generic architecture of a machine learning approach classifier

14

3.1 flow diagram of the proposed model

23

4.1 Confusion matrix

32

4.2 Roc curve for cluster 1

33

4.3 Roc curve for cluster 2

34

TABLE OF CONTENTS

Declaration

Certificate

ii

Abstract

iii

Acknowledgement

iv

List of Tables

List of Figures

CHAPTER 1: INTRODUCTION
1.1 Background

1.2 Objective

1.3 Motivation and Goals

CHAPTER 2: LITERATURE SURVEY


2.1 Issues in Sentiment Analysis

2.2 Classification of Approaches

2.2.1 Knowledge-based Approach

2.2.2 Relationship-based Approach

2.2.3 Language Models Approach

2.2.4 Discourse Structures and Semantics Approach

2.3 Twitter Specific Approaches

2.3.1 Lexical Analysis Approach

10

2.3.2 Machine Learning Approach

12

2.3.3 Hybrid Approach

13

2.4 Performance Review

14

2.4.1 Lexical Approach Performance

14

2.4.2 Machine Learning Approach Performance

15

2.4.3 Hybrid Approach Performance

16

2.5 Research Gap

17

vi

CHAPTER 3: METHODOLOGY
3.1 R Studio

20

3.2 Training Data

20

3.3 Test Data

20

3.4 Obtaining Raw Data

20

3.5 Process Flow

21

3.6 Steps of k Medoids Clustering Algorithm

22

3.7 Prediction Algorithms

22

3.7.1 Logistic Regression

22

3.7.2 Random Forest algorithm

23

3.7.3 CART Method

24

3.8 Feature Extraction

25

3.9 Evaluation

29

CHAPTER 4: Experiment
4.1 Data Sets

30

4.2 Experimental Results

30

CHAPTER 5: Conclusion
5.1 Conclusion

33

CHAPTER 6: FUTURE EXTENSIONS


6.1 Future Extensions

37

References

38

vii

viii

You might also like