Country Year Sex Age Suicides - No Population: Finding Missing Values

Q.
1 Derive descriptive statistics regarding this dataset, including measures of central

endency.
Solution
who <- read.csv(file = 'who_suicide_statistics.csv')
who
# 
x=summary(who)
write.csv(x,file="Central_Tendance.csv")

country year sex age suicides_no population

Length:43776 Min. :1979 Length:43776 Length:43776 Min. : 0.0 Min. : 259
Class :character 1st Qu.:1990 Class :character Class :character 1st Qu.: 1.0 1st Qu.: 85113
Median : Median : Median :
Mode :character 1999 Mode :character Mode :character 14.0 380655
Mean : Mean :
Mean :1999 193.3 1664091
3rd Qu.:
3rd Qu.:2007 3rd Qu.: 91.0 1305698
Max. : Max. :
Max. :2016 22338.0 43805214
NA's :2256 NA's :5460
2. The WHO_Suicide_Statistics.csv file contains few missing entries in the file. Find out how
many rows have the missing values. How would you deal with these missing values for your
analysis?
Solution
Finding Missing values
colSums(is.na(who))

0 0 0 0 2256 5460
Dealing with NULL values :- Replace NaN with mean
install.packages('imputeTS',repos='http://cran.us.r-project.org')
library('imputeTS')
who<-na_mean(who)
colSums(is.na(who))

0 0 0 0 0 0
3. For a country of your choice, add two new columns (percentage change in population and
percentage change in suicides) and represent graphically if there is any correlation between these
two parameters?
Solution
library(sqldf)
y=sqldf("select * from who where Country='Brazil'")
y=na_mean(y)
y
install.packages("Hmisc" ,repos='http://cran.us.r-project.org' )
mydata <- y %>%

select(population,suicides_no)
mydata=na_mean(mydata)
head(mydata)
install.packages('corrr' ,repos='http://cran.us.r-project.org')
z=cor(mydata, use="everything")
z
population suicides_no
population 1.0000000 0.4287129
suicides_no 0.4287129 1.0000000
install.packages('corrgram', repos='http://cran.us.r-project.org')
library(corrgram)
corrgram(mydata, order=TRUE,lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Population and Suicides in Brazil")
4. Compare the total number of suicides for each age group with total population of the same age
group. Is there any association between these two factors? Also, find the ratio between the total
number of suicides per age group and its population and explain the significance of this ratio.
Solution
x1=sqldf("select sum(age) from who where age='15-24 years' group by age")

x1
r1=sqldf("select sum(population) from who where age='15-24 years' group by age")
r1




x6=sqldf("select sum(age) from who where age>='75 years' group by age")

r6=sqldf("select sum(population) from who where age>='75 years' group by age")
ratio_answer=matrix(c(x1/r1,x2/r2,x3/r3,x4/r4,x5/r5,x6/r6))
1,] 8.720088e-06
[2,] 1.506884e-05
[3,] 1.346127e-05
[4,] 1.923038e-06
[5,] 3.298526e-05
[6,] 0.0001182575
png(file = "barchart_suicide.png")
barplot(AgeGroupSum,names.arg=AgeGroup,xlab="Year Range",ylab="Sum Range",col="blue",

main=" Assignment chart",border="red")
5. Analyze the number of suicides by age group for male and female. Depict graphically which
category (Male or Female) is more prone to suicide?
Solution
Year Wise Male
who%>%
filter(suicides_no>0 & sex=="male")%>%
group_by(year,age)%>%
summarise(total_suicides=sum(suicides_no))%>%
ggplot(aes(year,total_suicides))+
geom_col()+
facet_wrap(~age)+
labs(title="Year/Age wise for males")+
scale_x_continuous(limits=c(1978,2017),breaks = seq(1979,2016,by = 5))+
scale_y_continuous(labels = scales::comma)+
theme_minimal()
Year Wise Female
6. For a particular year, find out the top 10 (Ten) countries which have the highest number of
suicides and plot the visualization graph between country and number of suicides.
Solution
options(repr.plot.width=9, repr.plot.height=4)
who%>%
filter(suicides_no>0)%>%
group_by(country)%>%
summarise(total_suicides=sum(suicides_no))%>%
arrange(desc(total_suicides))%>%
head(10)%>%
ggplot(aes(reorder(country,total_suicides),total_suicides))+
geom_col()+
coord_flip()+
labs(title="Top 10 countries",y="Total counts",x="Country")+
scale_y_continuous(labels = scales::comma)+
theme_minimal()
7. Does population influence the total number of suicides? Justify your answer
Solution
Above graph show countries with high population tends to high suicides rates
Forany five countries, create a least-square model of the total number of suicides and estimate the total
number of suicides in next ten years from the last available data (for example, the last available data for
Thailand is 2016) in the file.
I have done this, plz verify this
plot(last$year,last$suicides_no,
main="year and suicides numbers",
)
fit <- lm(as.numeric(last$year)~as.numeric(last$suicides_no))
fit
(Intercept) as.numeric(last$suicides_no)
1.997e+03 4.162e-03

Country Year Sex Age Suicides - No Population: Finding Missing Values

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Country Year Sex Age Suicides - No Population: Finding Missing Values

Uploaded by

Copyright:

Available Formats

Q.

1 Derive descriptive statistics regarding this dataset, including measures of central

country year sex age suicides_no population

Finding Missing values

country year sex age suicides_no population

country year sex age suicides_no population

mydata <- y %>%

x1=sqldf("select sum(age) from who where age='15-24 years' group by age")

x2=sqldf("select sum(age) from who where age='25-34 years' group by age")

x3=sqldf("select sum(age) from who where age='35-54 years' group by age")

x4=sqldf("select sum(age) from who where age='5-14 years' group by age")

x5=sqldf("select sum(age) from who where age='55-74 years' group by age")

x6=sqldf("select sum(age) from who where age>='75 years' group by age")

barplot(AgeGroupSum,names.arg=AgeGroup,xlab="Year Range",ylab="Sum Range",col="blue",

You might also like