Professional Documents
Culture Documents
FINAL REPORT
GROUP MEMBERS :
2.CHRIS LAZER-17BCE2160
SLOT : G1
SCHOOL :SCOPE
ABSTRACT
KEYWORDS
INTRODUCTION
CODE AND RESULTS
CONCLUSION
REFERENCES
Abstract:
Suicide is one of the major causes of death across the world. With data being
generated in humongous quantity every second through various media like
social networking sites, surveys, etc.; a lot of relevant information is available
for suicide analysis. Data from social networking sites especially twitter has
been extensively considered for research to automate the process of suicide
prediction by using various machine learning and text mining techniques. Apart
from the social media analysis, socio-economic and cultural factors have been
studied to find reasons that drive people towards suicides. A lot of research
has focused on studying social media posts and surveys but research on real-
time data is at inchoate stage. This paper aims at elucidating various factors
responsible for suicide ideation, techniques and algorithms used to automate
suicide prediction and also notice the issues and challenges faced by the
existing research to expatiate requirements of future research.
Introduction:
Statistics for Suicide Analysis for all the countries in the world using
Machine Learning Algorithms to find some interesting patterns, solutions
and Clues about Suicides using Data Analysis and Data Visualizations.
CODE AND RESULTS:
import numpy as np
import pandas as pd
data = pd.read_csv('Desktop/Data_Visualization_project//who_suicide_statistics.csv')
print(data.shape)
data.head()
# correlation plot
corr = data.corr()
sns.heatmap(corr, mask = np.zeros_like(corr, dtype = np.bool),
cmap = sns.diverging_palette(3, 3, as_cmap = True), square = True, ax = ax)
# renaming the columns
data.columns
# visualising the different countries distribution in the dataset
data['country'].value_counts(normalize = True)
data['country'].value_counts(dropna = False).plot.bar(color = 'cyan', figsize = (24, 8))
data['year'].value_counts(normalize = True)
data['year'].value_counts(dropna = False,).plot.bar(color = 'magenta', figsize = (8, 6))
x1 = data[data['age'] == 0]['suicides'].sum()
x2 = data[data['age'] == 1]['suicides'].sum()
x3 = data[data['age'] == 2]['suicides'].sum()
x4 = data[data['age'] == 3]['suicides'].sum()
x5 = data[data['age'] == 4]['suicides'].sum()
x6 = data[data['age'] == 5]['suicides'].sum()
data['gender'].value_counts(normalize = True)
data['gender'].value_counts(dropna = False).plot.bar(color = 'black', figsize = (4, 3))
# total population of 141 countres over which the suicides survey is committed
data['population'].sum()
63761315943.0
# Average population
Avg_pop = data['population'].mean()
print(Avg_pop)
1664091.1353742562
# total number of suicides committed in the 141 countries from 1985 to 2016
data['suicides'].sum()
8026455.0
Avg_sui = data['suicides'].mean()
print(Avg_sui)
193.3153901734104
data['population'] = data['population'].fillna(data['population'].median())
data['population'].isnull().any()
False
data['suicides'] = data['suicides'].fillna(0)
data['suicides'].isnull().any()
False
x = data.iloc[:,:-1]
y = data.iloc[:,-1]
print(x.shape)
print(y.shape)
(43776, 4)
(43776,)
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
(32832, 4)
(32832,)
(10944, 4)
(10944,)
# min max scaling
# creating a scaler
mm = MinMaxScaler()
return self.partial_fit(X, y)
# visualising the principal components that will explain the highest share of variance
#explained_variance = pca.explained_variance_ratio_
#print(explained_variance)
wcss = []
for i in range(1, 11):
km = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state =
0)
km.fit(x_train)
wcss.append(km.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('no. of clusters')
plt.ylabel('WCSS')
plt.show()
MSE : 117021.75686150225
RMSE : 342.08442943446323
r2_score : 0.8003564233332043
CONCLUSION:
This REPORT surveyed various approaches and media used in suicide analysis like WHO
based suicide detection systems, analysis of census data, various blogs and surveys. It can
be inferred that though these models exhibit fair enough accuracies, there is a need for a
better and robust system to automate the process of suicide prediction with high accuracy
and precision that functions well in real time environment.
References :
[1] Varathan, K.D., Talib, N.: Suicide detection system based on Twitter, Science and
Information Conference, pp. 785-788, DOI:10.1109/SAI.2014. 6918275, London, UK (2014)
[2] C. Hanson et al.: The Impact of Online Social Capital on Twitter Users At-Risk for Suicide,
IEEE International Conference on Healthcare Informatics(ICHI), pp. 454-454, DOI:10.1109/
ICHI.2017.87, Park City, UT, USA (2017)
[3] Priyanka, S.S., Galgali, S., Priya, S.S., Shashank, B.R., Srinivasa, K.G.: Analysis of suicide
victim data for the prediction of number of suicides in India, International Conference on
Circuits, Controls, Communications and Computing (I4C), pp. 1-5,
DOI:10.1109/CIMCA.2016.8053293, Bangalore, India (2016)
[4] M. E. Larsen et al.: We Feel: Mapping Emotion on Twitter, IEEE Journal of Biomedical
and Health Informatics, vol. 19, no. 4, pp. 1246-1252, DOI:10.1109/JBHI.2015.2403839
(2015)
[5] https://data.gov.in/catalog/stateut-wisedistribution-suicides-means-adopted
[6] B. O'Dea et al.: Detecting suicidality on Twitter, Internet Interventions volume 2 Issue 2,
183– 188 (2015)
[7] Ben-Ari, A., Hammond, K.: Text Mining the EMR for Modeling and Predicting Suicidal
Behavior among US Veterans of the 1991 Persian Gulf War, 2015 48th Hawaii International
Conference on System Sciences (HICSS), pp. 3168-3175, DOI:10.1109/ HICSS.2015. 382,
Kauai, HI, USA (2015)
[8] H. H. Shuai et al.: A Comprehensive Study on Social Network Mental Disorders Detection
via Online Social Media Mining, IEEE Transactions on Knowledge and Data Engineering, vol.
PP, no. 99, pp. 1-1, DOI:10.1109/TKDE.2017.2786695 (2017)
[9] Jihoon, O., Kyongsik, Y., Ji-Hyun, H., JeongHo, C.: Classification of Suicide Attempts
through a International Journal of Pure and Applied Mathematics Special Issue 242 Machine
Learning Algorithm Based on Multiple Systemic Psychiatric Scales, Frontiers in Psychiatry,
volume 8, pp 192, DOI: 10.3389/fpsyt.2017.00192 (2017)
[10] X. Huang et al.: Detecting Suicidal Ideation in Chinese Microblogs with Psychological
Lexicons, IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and IEEE 11th Intl
Conf on Autonomic and Trusted Computing and IEEE 14th Intl Conf on Scalable Computing
and Communications and Its Associated Workshops (UTC-ATC-ScalCom), pp. 844- 849,
DOI:10.1109/UIC-ATC-ScalCom.2014.48, Bali, Indonesia (2014)