Professional Documents
Culture Documents
A PROJECT REPORT
Submitted by
DILLIP. K (111418104022)
of
BACHELOR OF ENGINEERING
in
THIRUVALLUR-602 025
JUNE 2022
i
ANNA UNIVERSITY:: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “AI TO DETECT SOCIAL MEDIA USERS
SIGNATURE SIGNATURE
Dr. V. VIJAYARAJA, Ms. B. GUNASUNDARI,
PROFESSOR, ASSISTANT PROFESSOR,
HEAD OF THE DEPARTMENT, SUPERVISOR,
Department of CSE, Department of CSE ,
Prathyusha Engineering college, Prathyusha Engineering college,
Thiruvallur- 602 025. Thiruvallur- 602 025.
Place: Thiruvallur
Date:
Submitted for the Project Viva-Voce held on…………..….at
Prathyusha Engineering College, Thiruvallur- 602 025.
ii
ACKNOWLEDGEMENT
iii
ABSTRACT
iv
TABLE OF CONTENTS
ABSTRACT iv
LIST OF FIGURES ix
LIST OF TABLES x
LIST OF ABBREVIATION xi
1 INTRODUCTION
1.1 OVERVIEW 2
1.2 OBJECTIVE 3
2 SYSTEM ANALYSIS
2.1.1 DISADVANTAGES 7
2.2.1 ADVANTAGES 8
v
CHAPTER TITLE PAGE
NO NO
3 SYSTEM REQUIREMENTS
3.1 HARDWARE REQUIREMENTS 10
4 SYSTEM DESIGN
5 SYSTEM IMPLEMENTATION
vi
CHAPTER TITLE PAGE
NO NO
vii
ANNEXURE 35
APPENDIX 2NSCREENSHOT 72
REFERENCES 77
viii
LIST OF FIGURES
ix
LIST OF TABLES
x
LIST OF ABBREVIATIONS
xi
LIST OF SYMBOLS
+ Public Class
Represents a collection of
Name
-Private similar entities grouped
# -attribute together.
1. Class
Protected -attribute
xii
S.NO NOTATION NOTATION DESCRIPTION
NAME
xiii
S.NO NOTATION NOTATION DESCRIPTION
NAME
xiv
CHAPTER 1
INTRODUCTION
1
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
2
1.2 OBJECTIVE
we propose a lexicon-enhanced LSTM model. The model first uses
sentiment lexicon as an extra information pre-training a word sentiment classifier
and then get the sentiment embeddings of words including the words not in the
lexicon.
[1]. Luna Ansari; Shaoxiong Ji, Ensemble Hybrid Learning Methods for
Automated Depression Detection, JAN 2022
In this paper, text classifiers are trained for depression detection. The key
objective is to improve depression detection performance by examining and
comparing two sets of methods: hybrid and ensemble . The purpose of using the
ensemble approach can vary, and it is mainly applied for three purposes: decreasing
bias in the model, decreasing variance in the model, and enhancing predictions. The
results show that ensemble models outperform the hybrid model classification
results. The strength and effectiveness of the combined features demonstrate that
better performance can be achieved by multiple feature combinations and proper
feature selection.
3
sequential patterns from depressive user behavior. TROAD (Tracing the Roadmap of
Depressive Users), a framework for the collection, pre-processing, modelling, and
knowledge discovery of sequential patterns of user emotions from social networks.
4
we can use our models to identify the genuinely crucial sentiment features of each
user’s posts and suppress other unimportant information as possible. The proposed
models can encode the relations between posts in user representation. Experimental
results showed that our models performed better than the previous state-of-the-art
models on the Reddit Self-Reported Depression Diagnosis dataset, and also
performed well on the Early Detection of Depression dataset.
[5] Marcel Trotzek; Sven Koitka, Utilizing Neural Networks and Linguistic
Metadata for Early Detection of Depression Indications in Text Sequence,
September 2020
This paper aims the early detection of depression using machine learning
models based on messages on a social platform. In particular, a convolutional neural
network based on different word embedding is evaluated and compared to a
classification based on user-level linguistic metadata. An ensemble of both
approaches is shown to achieve state-of-the-art results in a current early detection
task. As the results presented in this paper are optimized to obtain the best
performance on the eRisk 2017 task for comparison to previously published results
and among these models, future work will have to show how these models perform
on yet unseen data.
5
CHAPTER 2
SYSTEM ANALYSIS
6
CHAPTER 2
SYSTEM ANALYSIS
The system study is to provide the description about the existing system, its
Early intervention for depression can improve employee productivity and reduce
absenteeism Early detection, intervention, and appropriate treatment can promote
remission and reduce the emotional and financial burdens.
2.1.1 DISADVANTAGES
7
embedding and its sentiment embedding as the input of LSTM and fine-tune the
word sentiment classifier network.
2.2.1 ADVANTAGES
Outperforms in terms of accuracy, due to the diversity and richness of its
feature set.
Reduces the number of words in the corpus
Decreases the computation time.
Gives the words strength when fed to the classification model.
Reduce the possibility of overfitting.
8
CHAPTER 3
SYSTEM REQUIREMENTS
9
CHAPTER 3
SYSTEM REQUIREMENTS
3. INTRODUCTION
Server : Wampserver 2i
10
3.3 SOFTWARE DESCRIPTION
The user can develop the Tweepy functions using Python 3.7 in addition to
the already supported 2.7 and 3.6 versions. Python 3.7 is the newest major release
of the Python language, and it contains many new features such as support for data
classes, customization of access to module attributes, and typing enhancements. The
Python 3.7 runtime is available in all regions where Tweepy is available.
3.3.1.1 Pandas
pandas is a fast, powerful, flexible and easy to use open source data analysis
and manipulation tool, built on top of the Python programming language. pandas is
a Python package that provides fast, flexible, and expressive data structures designed
to make working with "relational" or "labeled" data both easy and intuitive. It aims
to be the fundamental high-level building block for doing practical, real world data
analysis in Python.
3.3.1.2 NumPy
NumPy, which stands for Numerical Python, is a library consisting of
multidimensional array objects and a collection of routines for processing those
arrays. Using NumPy, mathematical and logical operations on arrays can be
performed. NumPy is a general-purpose array-processing package. It provides a
high-performance multidimensional array object, and tools for working with these
arrays.
11
3.3.2 MySQL
MySQL tutorial provides basic and advanced concepts of MySQL. Our
MySQL tutorial is designed for beginners and professionals. MySQL is a relational
database management system based on the Structured Query Language, which is the
popular language for accessing and managing the records in the database. MySQL
is open-source and free software under the GNU license. It is supported by Oracle
Company. MySQL database that provides for how to manage database and to
manipulate data with the help of various SQL queries. These queries are: insert
records, update records, delete records, select records, create tables, drop tables, etc.
There are also given MySQL interview questions to help you better understand the
MySQL database.
3.3.3 WampServer
WampServer is a Windows web development environment. It allows you to
create web applications with Apache2, PHP and a MySQL database. Alongside,
PhpMyAdmin allows you to manage easily your database. WampServer is a reliable
web development software program that lets you create web apps with MYSQL
database and PHP Apache2. With an intuitive interface, the application features
numerous functionalities and makes it the preferred choice of developers from
around the world. The software is free to use and doesn’t require a payment or
subscription.
3.3.4 Bootstrap 4
Bootstrap is a free and open-source tool collection for creating responsive
websites and web applications. It is the most popular HTML, CSS, and JavaScript
framework for developing responsive, mobile-first websites. It solves many
problems which we had once, one of which is the cross-browser compatibility issue.
Nowadays, the websites are perfect for all the browsers (IE, Firefox, and Chrome)
and for all sizes of screens (Desktop, Tablets, Phablets, and Phones). All thanks to
12
Bootstrap developers -Mark Otto and Jacob Thornton of Twitter, though it was later
declared to be an open-source project.
13
CHAPTER 4
SYSTEM DESIGN
14
CHAPTER 4
SYSTEM DESIGN
15
Fig 4.1 System Architecture for ai to detect social media users depression
polarity score and diagnose using auto curative therapy
Use Cases are used to describe the visible interactions that the system will
have with users and external systems. They are used to describe how a user would
perform their role using the system.
16
Fig 4.2.1 Use case diagram for ai to detect social media users depression
polarity score and diagnose using auto curative therapy
17
4.2.2 CLASS DIAGRAM
Fig 4.2.3 Class diagram for ai to detect social media users depression polarity
score and diagnose using auto curative therapy
18
4.2.3 ACTIVITY DIAGRAM
Fig 4.2.2 Activity diagram for ai to detect social media users depression
polarity score and diagnose using auto curative therapy
19
4.2.4 SEQUENCE DIAGRAM
A Sequence diagram is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. A sequence diagram shows object interactions arranged in time
sequence. It depicts the objects and classes involved in the scenario and the sequence
of messages exchanged between the objects needed to carry out the functionality of
the scenario.
Fig 4.2.4 Sequence diagram for ai to detect social media users depression
polarity score and diagnose using auto curative therapy
20
4.2.5 COLLABORATION DIAGRAM
Another type of interaction diagram is the Collaboration diagram. A
collaboration diagram represents a collaboration, which is a set of objects related in
a particular context, and interaction, which is a set of messages exchange among the
objects within the collaboration to achieve a desired outcome.
Fig 4.2.5 Collaboration Diagram for ai to detect social media users depression
polarity score and diagnose using auto curative therapy
21
CHAPTER 5
SYSTEM IMPLEMENTATION
22
CHAPTER 5
SYSTEM IMPLEMENTATION
23
The existing users of Facebook will also have to upload a scanned copy of their
Aadhar Card. If they fail to do so, their profile will be suspended within the next 15
days.
5.2.1.2 Depression Analysis API
In this module we developed the API for Depression analytics on chat or post
user data. It focuses on keywords and analyzes chat or post according to a two-pole
scale (positive and negative).
After the data pre-processing step, the next essential step is the choice of
features on a refined dataset. Supervised deep learning classifiers require textual data
in vector form to get trained on it. The textual features are converted into vector form
24
using TF and TF-IDF techniques in this project. Features extraction techniques not
only convert textual features into vector form but also helps to find significant
features necessary to make predictions. For the most part all features do no
contribute to the prediction of the target class. This will be achievable with the
intention of a term would seem a lot further in lengthy documents than short
documents because every document is variant in extent. Like the mode about
standardization:
No. of times term t shows in a document
TF(t)= ----------------------------------------------------------
Total no. of terms inside document
25
CHAPTER 6
TESTING
26
CHAPTER 6
TESTING
Unit testing is performed for testing modules against detailed design. Inputs to the
process are usually compiled modules from the coding process. Each module is
assembled into a larger unit during the unit testing process. Testing has been performed
on each phase of projects design and coding. We carry out testing of the module
interface to ensure the proper flow of information in and out of the unit while testing.
We make sure that the live streaming data is fetched from twitter and its integrity
throughout the algorithm execution by examining Natural language processing.
27
6.1.2 TEST CASES OF TOPIC MODELING AND EMOTION ANALYSIS
PASS
TC002 Registration User can login
Log in must be done Successfully
to the
previously. application logged in
successfully.
User is logged
Automatically PASS
Depression in to the system extract social Successfully
TC005 Detection media news shows the
and enter feed and result
classify it.
required inputs
28
6.2 INTEGRATION TESTING
The listed tests were conducted by the admin manually at the various
development stages. Unit testing was conducted and errors are changed. The
integration testing is performed after the system integration of the web page with
the emotions portrayed and errors are rectified systems like accurately labelling
tweets etc. The results were analyzed, and the appropriate alterations were made.
The test results proved to be positive and henceforth the application is feasible, and
test approved.
29
CHAPTER 7
30
CHAPTER 7
7.1 RESULTS
7.2 DISCUSSION
The results and comparisons of different classifiers after data training and
testing are presented in this section. We gathered 49799 tweets from the online
resource ‘kaggle’ and translated them into English using the python library
Googletrans, which uses the Google Translate Ajax API. 42797 tweets were used to
train various ML and DL models. One seven thousand tweets were used for testing
in order to quantify accuracy and assessment metrics. As explained about evaluation
measures in chapter 9, we have evaluated accuracy, precision, recall, and f-measures
that are evaluation measures measured using LR, XGBM and Naive Bayes, LSTM-
CNN and BiLSTM. Finally, using various graphs, a comparison of models is
presented below. The findings in Table 4 show that the deep learning algorithm
(BiLSTM) is a stronger method for detecting Depression tweet classification, with
high accuracy of 98.4%.
31
CHAPTER 8
32
CHAPTER 8
8.1 CONCLUSION
Depression is a pressing issue of our society today, affecting more than 264
million people globally and still increasing. It may become a serious health issue
when it lasts more than two weeks with moderate or severe intensity. It can even
lead to suicide if a depressed person is not receiving proper treatment. This work
aims to make timely depression intensity estimation from social media, in order to
aid in a proper treatment according to the depression level. Detecting and Classifying
depression levels is critical for various different types of posts. In this project, we
proposed a framework for classifying depression levels using LE-LSTM algorithms.
By using the sentiment lexicon to train a word sentiment classifier, we can get the
sentiment embedding of each word. Concatenating the word embedding and its
sentiment embedding as the input of LE-LSTM can make a higher performance in
sentiment analysis tasks. The algorithms are designed to analyze the tweet for
emotion detection as well as for detection of suicidal thoughts among people on
social media. We validated the performance of our method by conducting extensive
experiments on a standard dataset and outperformed the other alternatives for
polarity estimation.
33
framework for analyzing other mental health disorders, such as anxiety,
schizophrenia, bipolarity, among others. This project will lead to the development of
automatic preliminary assessment methods based on social data.
34
ANNEXURE
35
ANNEXURE
APPENDIX 1
SOURCE CODE:
Main.py
from flask import Flask, render_template, Response, redirect, request, session, abort,
url_for
import datetime
import random
import warnings
36
import numpy as np
#nltk.download()
import os
import time
import shutil
import hashlib
import urllib.request
import urllib.parse
37
from urllib.request import urlopen
import webbrowser
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd="",
charset="utf8",
database="depression_social_media"
app = Flask(_name_)
##session key
app.secret_key = 'abcdef'
#######
UPLOAD_FOLDER = 'static/upload'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
#####
#####
38
print("yes")
else:
print("no")'''
if request.method == 'POST':
username1 = request.form['uname']
password1 = request.form['pass']
mycursor = mydb.cursor()
myresult = mycursor.fetchone()[0]
if myresult>0:
session['username'] = username1
return redirect(url_for('userhome'))
else:
return render_template('index.html',msg=msg,act=act)
39
@app.route('/register',methods=['POST','GET'])
def register():
result=""
act=""
mycursor = mydb.cursor()
if request.method=='POST':
name=request.form['name']
gender=request.form['gender']
email=request.form['email']
uname=request.form['uname']
pass1=request.form['pass']
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
cnt = mycursor.fetchone()[0]
if cnt==0:
40
maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1
act="success"
mycursor.execute(sql, val)
mydb.commit()
act="success"
return redirect(url_for('index'))
else:
act="wrong"
return render_template('register.html',act=act)
@app.route('/login',methods=['POST','GET'])
def login():
cnt=0
41
act=""
msg=""
if request.method == 'POST':
username1 = request.form['uname']
password1 = request.form['pass']
mycursor = mydb.cursor()
myresult = mycursor.fetchone()[0]
if myresult>0:
session['username'] = username1
return redirect(url_for('admin'))
else:
return render_template('login.html',msg=msg,act=act)
42
@app.route('/userhome', methods=['GET', 'POST'])
def userhome():
uname=""
photo=""
fn=""
if 'username' in session:
uname = session['username']
name=""
print(uname)
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
mycursor = mydb.cursor()
value = mycursor.fetchone()
name=value[1]
if request.method=='POST':
detail=request.form['detail']
file = request.files['file'
sg1=sg.split('|')
43
sgg=len(sg1)
sn=sgg-1
rn=randint(1,sn)
sugges=sg1[rn]
nx=0
ddf=[]
ddf.append(fname)
nx+=1
rn2=randint(1,nx)
fn=ddf[rn2]
motive=fn
elif deptype==3:
print("mail")
cm1 = mycursor.fetchone()[0]
if cm1>0:
44
mycursor.execute("SELECT * FROM ds_contact where
cname=%s",(uname,))
fdd2 = mycursor.fetchone()
un=fdd2[1]
if cn==una or un==una:
ss=""
else:'''
#sd.append(una)
ss=""
else:
sd.append(una)
ln=len(sd)
fdata=[]
if ln>0:
for sr in sd:
srow = mycursor.fetchone()
ffd=[]
ffd.append(srow[0])
45
ffd.append(srow[1])
ffd.append(srow[2])
ffd.append(srow[3])
ffd.append(srow[4])
ffd.append(srow[5])
ffd.append(srow[6])
ffd.append(srow[7])
ffd.append(srow[8])
ffd.append(srow[9])
fdata.append(ffd)
return
render_template('userhome.html',value=value,data=data,fdata=fdata,fn=fn)
def cover_photo():
uname=""
photo=""
if 'username' in session:
uname = session['username']
46
name=""
mycursor = mydb.cursor()
value = mycursor.fetchone()
if request.method=='POST':
file = request.files['file2']
try:
if file.filename == '':
return redirect(request.url)
if file:
fn=file.filename
photo="C"+str(value[0])+fn
#fn1 = secure_filename(fn)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], photo))
mydb.commit()
maxid=1
47
sql = "INSERT INTO ds_comment(id,uname,pid,comment,rdate,name)
VALUES (%s, %s, %s, %s, %s, %s)"
val = (maxid,uname,pid,comment,rdate,name)
mycursor.execute(sql, val)
mydb.commit()
return redirect(url_for('userhome'))
return render_template('userhome.html',value=value)
def send_req():
uname=""
photo=""
user = request.args.get('user')
act = request.args.get('act')
if 'username' in session:
uname = session['username']
name=""
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
48
mycursor = mydb.cursor()
value = mycursor.fetchone()
name=value[1]
if act=="req":
maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1
val = (maxid,uname,user,'0')
mycursor.execute(sql, val)
mydb.commit()
maxid2=maxid+1
val = (maxid2,user,uname,'0')
mycursor.execute(sql, val)
mydb.commit()
49
return redirect(url_for('userhome'))
return render_template('userhome.html',value=value)
def contact():
uname=""
photo=""
act = request.args.get('act')
cname = request.args.get('cname')
if 'username' in session:
uname = session['username']
name=""
print(uname)
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
mycursor = mydb.cursor()
value = mycursor.fetchone()
###Frnd
50
mycursor.execute("SELECT * FROM ds_contact where uname=%s &&
status=1",(uname, ))
fdd = mycursor.fetchall()
fdata=[]
for rd in fdd:
fdd1 = mycursor.fetchone()
dd=[]
dd.append(fdd1[1])
dd.append(fdd1[4])
dd.append(fdd1[6])
dd.append(fdd1[7])
fdata.append(dd)
flen=len(fdata)
#######Req
fdd2 = mycursor.fetchall()
fdata2=[]
51
for rd2 in fdd2:
fdd11 = mycursor.fetchone()
dd1=[]
dd1.append(fdd11[1])
dd1.append(fdd11[4])
dd1.append(fdd11[6])
dd1.append(fdd11[7])
fdata2.append(dd1)
flen2=len(fdata2)
##########
if act=="ok":
mydb.commit()
mydb.commit()
return redirect(url_for('contact'))
52
return
render_template('contact.html',value=value,fdata=fdata,fdata2=fdata2,flen=flen,fle
n2=flen2)
def messages():
uname=""
photo=""
act = request.args.get('act')
cname = request.args.get('cname')
if 'username' in session:
uname = session['username']
name=""
print(uname)
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
mycursor = mydb.cursor()
value = mycursor.fetchone()
###Frnd
53
mycursor.execute("SELECT * FROM ds_contact where uname=%s &&
status=1",(uname, ))
fdd = mycursor.fetchall()
fdata=[]
for rd in fdd:
fdd1 = mycursor.fetchone()
dd=[]
dd.append(fdd1[1])
dd.append(fdd1[4])
dd.append(fdd1[6])
dd.append(fdd1[7])
fdata.append(dd)
flen=len(fdata)
#####
mdata=[]
value2=[]
if act=="chat":
54
value2 = mycursor.fetchone()
mdata = mycursor.fetchall()
value3=[]
if act=="mess":
if request.method=='POST':
message=request.form['message']
value3 = mycursor.fetchone()
maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1
val = (maxid,uname,cname,message)
mycursor.execute(sql, val)
mydb.commit()
55
return redirect(url_for('messages',act='chat',cname=cname))
return
render_template('messages.html',value=value,fdata=fdata,value2=value2,act=act,m
data=mdata)
def admin():
uname=""
if 'username' in session:
uname = session['username']
name=""
print(uname)
return render_template('admin.html')
def process():
uname=""
if 'username' in session:
uname = session['username']
pd.set_option("display.max_colwidth", 200)
56
data =
pd.read_csv("static/data/training.1600000.processed.noemoticon.csv",encoding='la
tin-1')
dat=data.head()
data1=[]
for ss in dat.values:
data1.append(ss)
data.columns = DATASET_COLUMNS
dat2=data.head()
data2=[]
data2.append(ss2)
dat3=data.head()
data3=[]
data3.append(ss3)
positif_data = data[data.target==4].iloc[:25000,:]
print(positif_data.shape)
57
negative_data = data[data.target==0].iloc[:1000,:]
print(negative_data.shape)
data = pd.concat([positif_data,negative_data],axis = 0)
print(data.shape)
dat4=data.head()
data4=[]
data4.append(ss4)
return
render_template('process.html',data1=data1,data2=data2,data3=data3,data4=data4)
def process2():
uname=""
if 'username' in session:
uname = session['username']
pd.set_option("display.max_colwidth", 200)
58
data =
pd.read_csv("static/data/training.1600000.processed.noemoticon.csv",encoding='la
tin-1')
data.head()
data.columns = DATASET_COLUMNS
data.head()
data.head()
positif_data = data[data.target==4].iloc[:25000,:]
print(positif_data.shape)
negative_data = data[data.target==0].iloc[:1000,:]
print(negative_data.shape)
data = pd.concat([positif_data,negative_data],axis = 0)
print(data.shape)
dat=data.head()
data1=[]
59
for ss in dat.values:
data['Clean_TweetText'] = data['Clean_TweetText'].str.replace(r"http\S+",
"")
dat2=data.head()
data2=[]
data2.append(ss2)
dat3=data.head()
data3=[]
data3.append(ss3)
stopwords=nltk.corpus.stopwords.words('english')
def remove_stopwords(text):
return clean_text
dat4=data.head()
data4=[]
60
for ss4 in dat4.values:
data4.append(ss4)
dat5=data.head()
data5=[]
data5.append(ss5)
stemmer = PorterStemmer()
data['Clean_TweetText'] = data['Clean_TweetText'].apply(lambda x:
[stemmer.stem(i) for i in x])
dat6=data.head()
data6=[]
dat7=data.head()
data7=[]
61
dat8=data.head()
data8=[]
data8.append(ss8)
return
render_template('process2.html',data1=data1,data2=data2,data3=data3,data4=dta4,
data5=data5,data6=data6,data7=data7,data8=data8)
, methods=['GET', 'POST'])
def process3():
uname=""
if 'username' in session:
uname = session['username']
pd.set_option("display.max_colwidth", 200)
data =
pd.read_csv("static/data/training.1600000.processed.noemoticon.csv",encoding='la
tin-1')
data.head()
data.columns = DATASET_COLUMNS
62
data.head()
positif_data = data[data.target==4].iloc[:25000,:]
print(positif_data.shape)
negative_data = data[data.target==0].iloc[:1000,:]
print(negative_data.shape)
data = pd.concat([positif_data,negative_data],axis = 0)
print(data.shape)
data.head()
data.head()
data.head()
stopwords=nltk.corpus.stopwords.words('english')
def remove_stopwords(text):
63
clean_text=' '.join([word for word in text.split() if word not in stopwords])
return clean_text
plt.show()'''
'''plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()'''
'''count_vectorizer = CountVectorizer(stop_words='english')
cv = count_vectorizer.fit_transform(data['Clean_TweetText'])
cv.shape
X_train,X_test,y_train,y_test = train_test_split(cv,data['target'] ,
test_size=.2,stratify=data['target'], random_state=42)
64
xgbc.fit(X_train,y_train)
prediction_xgb = xgbc.predict(X_test)
print(accuracy_score(prediction_xgb,y_test))
rf = RandomForestClassifier(n_estimators=1000, random_state=42)
rf.fit(X_train,y_train)
prediction_rf = rf.predict(X_test)
print(accuracy_score(prediction_rf,y_test))
lr = LogisticRegression()
lr.fit(X_train,y_train)
prediction_lr = lr.predict(X_test)
print(accuracy_score(prediction_lr,y_test))
svc = svm.SVC()
svc.fit(X_train,y_train)
prediction_svc = svc.predict(X_test)
print(accuracy_score(prediction_svc,y_test))'''
filename2 = 'static/data/lexicon_enhance.csv'
dat22 = list(dat2.values.flatten())
cnt2=0
65
sd2=len(dat2)
data2=[]
if i<30:
cnt2=len(ss2)
data2.append(ss2)
i+=1
cols2=cnt2
return render_template('process4.html',data2=data2)
##LE-LSTM
amount_of_features = len(stock.columns)
sequence_length = seq_len + 1
result = []
result = np.array(result)
66
train = result[:int(row), :]
def build_model(layers):
model = Sequential()
model.add(LSTM(
input_dim=layers[0],
output_dim=layers[1],
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
layers[2],
return_sequences=False))
67
model.add(Dropout(0.2))
model.add(Dense(1,init='uniform',activation='linear'))
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
return model
def classify_data():
nondep_len=0
tot_len=0
dep_len=0
'''dat1 = pd.read_csv("static/data/training.1600000.processed.noemoticon.csv",
header=0)
tot_len=len(dat1)
filename2 = 'static/data/feature_extracted_depressed_twits.csv'
dat22 = list(dat2.values.flatten())
i=0
cnt2=0
dep_len=len(dat2)
nondep_len=tot_len-dep_len
68
data2=[]
if i<30:
cnt2=len(ss2)
data2.append(ss2)
i+=1
cols2=cnt2
#############
'''
ff=open("static/data/det1.txt","r")
det=ff.read()
ff.close()
value=det.split(",")
tot_len=value[0]
dep_len=value[1]
dd2=[int(dep_len),int(nondep_len)]
dd1=['Depressed','Non-Depressed']
69
values = dd2 #list(data.values())
plt.ylim((1,int(tot_len)))
plt.xlabel("Class")
plt.ylabel("Count")
plt.title("")
rr=randint(100,999)
fn="graph4.png"
plt.xticks(rotation=20)
plt.savefig('static/data/'+fn)
#plt.close()
plt.clf()
#########
v1=int(value[3])
v2=int(value[4])
v3=int(value[5])
dep=int(dep_len)
#graph5
70
dd2=[v3,v2,v1]
ax=dd2
dd1=['Highly','Medium','Low']
c=['red','blue','green']
width = 0.4)
plt.ylim((1,dep))
plt.xlabel("Depression Level")
plt.ylabel("Count")
plt.title("")
rr=randint(100,999)
fn="graph5.png"
plt.xticks(rotation=20)
plt.savefig('static/data/'+fn)
#plt.close()
71
APPENDIX 2
SCREENSHOTS
TRAINING PHASE
72
73
74
OUTPUT
75
REFERENCES
76
REFERENCES
77