You are on page 1of 11

Q.

1 Derive descriptive statistics regarding this dataset, including measures of central


endency.               
Solution
who <- read.csv(file = 'who_suicide_statistics.csv')
who
# <!-- Derive descriptive statistics regarding this dataset, including measures of central
endency.-->

x=summary(who)

write.csv(x,file="Central_Tendance.csv")
                                                                                                 

country year sex age suicides_no population


Length:43776 Min. :1979 Length:43776 Length:43776 Min. : 0.0 Min. : 259
Class :character 1st Qu.:1990 Class :character Class :character 1st Qu.: 1.0 1st Qu.: 85113
Median : Median : Median :
Mode :character 1999 Mode :character Mode :character 14.0 380655
Mean : Mean :
  Mean :1999     193.3 1664091
3rd Qu.:
  3rd Qu.:2007     3rd Qu.: 91.0 1305698
Max. : Max. :
  Max. :2016     22338.0 43805214
        NA's :2256 NA's :5460

  2. The WHO_Suicide_Statistics.csv file contains few missing entries in the file. Find out how
many rows have the missing values. How would you deal with these missing values for your
analysis?
Solution

Finding Missing values

colSums(is.na(who))

country year sex age suicides_no population


0 0 0 0 2256 5460
Dealing with NULL values :- Replace NaN with mean

install.packages('imputeTS',repos='http://cran.us.r-project.org')
library('imputeTS')
who<-na_mean(who)
colSums(is.na(who))

country year sex age suicides_no population


0 0 0 0 0 0

3. For a country of your choice, add two new columns (percentage change in population and
percentage change in suicides) and represent graphically if there is any correlation between these
two parameters?              
Solution
library(sqldf)
y=sqldf("select * from who where Country='Brazil'")

y=na_mean(y)
y
install.packages("Hmisc" ,repos='http://cran.us.r-project.org' )

mydata <- y %>%


select(population,suicides_no)
mydata=na_mean(mydata)
head(mydata)
install.packages('corrr' ,repos='http://cran.us.r-project.org')

z=cor(mydata, use="everything")
z
population suicides_no
population 1.0000000 0.4287129
suicides_no 0.4287129 1.0000000

install.packages('corrgram', repos='http://cran.us.r-project.org')
library(corrgram)
corrgram(mydata, order=TRUE,lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Population and Suicides in Brazil")

4. Compare the total number of suicides for each age group with total population of the same age
group. Is there any association between these two factors? Also, find the ratio between the total
number of suicides per age group and its population and explain the significance of this ratio.                  
Solution

x1=sqldf("select sum(age) from who where age='15-24 years' group by age")


x1
r1=sqldf("select sum(population) from who where age='15-24 years' group by age")
r1

x2=sqldf("select sum(age) from who where age='25-34 years' group by age")


r2=sqldf("select sum(population) from who where age='25-34 years' group by age")

x3=sqldf("select sum(age) from who where age='35-54 years' group by age")


r3=sqldf("select sum(population) from who where age='35-54 years' group by age")

x4=sqldf("select sum(age) from who where age='5-14 years' group by age")


r4=sqldf("select sum(population) from who where age='35-54 years' group by age")

x5=sqldf("select sum(age) from who where age='55-74 years' group by age")


r5=sqldf("select sum(population) from who where age='55-74 years' group by age")

x6=sqldf("select sum(age) from who where age>='75 years' group by age")


r6=sqldf("select sum(population) from who where age>='75 years' group by age")

ratio_answer=matrix(c(x1/r1,x2/r2,x3/r3,x4/r4,x5/r5,x6/r6))

1,] 8.720088e-06
[2,] 1.506884e-05
[3,] 1.346127e-05
[4,] 1.923038e-06
[5,] 3.298526e-05
[6,] 0.0001182575

png(file = "barchart_suicide.png")

barplot(AgeGroupSum,names.arg=AgeGroup,xlab="Year Range",ylab="Sum Range",col="blue",


main=" Assignment chart",border="red")
5. Analyze the number of suicides by age group for male and female. Depict graphically which
category (Male or Female) is more prone to suicide? 
Solution
Year Wise Male
who%>%
filter(suicides_no>0 & sex=="male")%>%
group_by(year,age)%>%
summarise(total_suicides=sum(suicides_no))%>%
ggplot(aes(year,total_suicides))+
geom_col()+
facet_wrap(~age)+
labs(title="Year/Age wise for males")+
scale_x_continuous(limits=c(1978,2017),breaks = seq(1979,2016,by = 5))+
scale_y_continuous(labels = scales::comma)+
theme_minimal()
Year Wise Female
6. For a particular year, find out the top 10 (Ten) countries which have the highest number of
suicides and plot the visualization graph between country and number of suicides. 

Solution

options(repr.plot.width=9, repr.plot.height=4)

who%>%
filter(suicides_no>0)%>%
group_by(country)%>%
summarise(total_suicides=sum(suicides_no))%>%
arrange(desc(total_suicides))%>%
head(10)%>%
ggplot(aes(reorder(country,total_suicides),total_suicides))+
geom_col()+
coord_flip()+
labs(title="Top 10 countries",y="Total counts",x="Country")+
scale_y_continuous(labels = scales::comma)+
theme_minimal()
7. Does population influence the total number of suicides? Justify your answer

Solution
Above graph show countries with high population tends to high suicides rates

 Forany five countries, create a least-square model of the total number of suicides and estimate the total
number of suicides in next ten years from the last available data (for example, the last available data for
Thailand is 2016) in the file. 
I have done this, plz verify this
plot(last$year,last$suicides_no,
main="year and suicides numbers",
)
fit <- lm(as.numeric(last$year)~as.numeric(last$suicides_no))
fit
(Intercept) as.numeric(last$suicides_no)
1.997e+03 4.162e-03

You might also like