Professional Documents
Culture Documents
x=summary(who)
write.csv(x,file="Central_Tendance.csv")
2. The WHO_Suicide_Statistics.csv file contains few missing entries in the file. Find out how
many rows have the missing values. How would you deal with these missing values for your
analysis?
Solution
colSums(is.na(who))
install.packages('imputeTS',repos='http://cran.us.r-project.org')
library('imputeTS')
who<-na_mean(who)
colSums(is.na(who))
3. For a country of your choice, add two new columns (percentage change in population and
percentage change in suicides) and represent graphically if there is any correlation between these
two parameters?
Solution
library(sqldf)
y=sqldf("select * from who where Country='Brazil'")
y=na_mean(y)
y
install.packages("Hmisc" ,repos='http://cran.us.r-project.org' )
z=cor(mydata, use="everything")
z
population suicides_no
population 1.0000000 0.4287129
suicides_no 0.4287129 1.0000000
install.packages('corrgram', repos='http://cran.us.r-project.org')
library(corrgram)
corrgram(mydata, order=TRUE,lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Population and Suicides in Brazil")
4. Compare the total number of suicides for each age group with total population of the same age
group. Is there any association between these two factors? Also, find the ratio between the total
number of suicides per age group and its population and explain the significance of this ratio.
Solution
ratio_answer=matrix(c(x1/r1,x2/r2,x3/r3,x4/r4,x5/r5,x6/r6))
1,] 8.720088e-06
[2,] 1.506884e-05
[3,] 1.346127e-05
[4,] 1.923038e-06
[5,] 3.298526e-05
[6,] 0.0001182575
png(file = "barchart_suicide.png")
Solution
options(repr.plot.width=9, repr.plot.height=4)
who%>%
filter(suicides_no>0)%>%
group_by(country)%>%
summarise(total_suicides=sum(suicides_no))%>%
arrange(desc(total_suicides))%>%
head(10)%>%
ggplot(aes(reorder(country,total_suicides),total_suicides))+
geom_col()+
coord_flip()+
labs(title="Top 10 countries",y="Total counts",x="Country")+
scale_y_continuous(labels = scales::comma)+
theme_minimal()
7. Does population influence the total number of suicides? Justify your answer
Solution
Above graph show countries with high population tends to high suicides rates
Forany five countries, create a least-square model of the total number of suicides and estimate the total
number of suicides in next ten years from the last available data (for example, the last available data for
Thailand is 2016) in the file.
I have done this, plz verify this
plot(last$year,last$suicides_no,
main="year and suicides numbers",
)
fit <- lm(as.numeric(last$year)~as.numeric(last$suicides_no))
fit
(Intercept) as.numeric(last$suicides_no)
1.997e+03 4.162e-03