You are on page 1of 12

DANANG UNIVERSITY

OF SCIENCE AND TECHNOLOGY

LABORATORY REPORT

Mini Projects:
Statistics and Probability

INSTRUCTOR: Prof. Nguyễn Chánh Tú


STUDENTS: Hứa Thị Bình Nguyên
Hồ Hoàng Hảo
Lê Gia Quang
Bùi Minh Hiệp
Trương Công Thắng
Nguyễn Văn Hậu

Da Nang, 1/2021
I. Introduction to problems
We investigate math scores of students enrolling in Da Nang University of
Technology (short: DUT) .So we collect data from 4 groups in FAST, analyze
these data and then conclude the math score of the whole DUT.
Parameter of interest: population mean of Math score m
Initial conjecture: Is the mean μ>7.8 ?
Member name and the tasks:
● Ho Hoang Hao (Leader)
● Hua Thi Binh Nguyen (Descriptive statistics analyst)
● Truong Cong Thang ( Descriptive statistics analyst)
● Bui Minh Hiep (Inferential statistics analyst)
● Le Gia Quang (Descriptive statistics analyst)
● Nguyen Van Hau(Hypothesis tester)

II.  Data Collection ways (in case you do not use the


provided data):
We conduct surveys using google form and quiz to receive
samples to create tables.

We found sampling and some non sampling errol, they are


questioning error, cheating error, unwillingness error, selection
error

III. Analysis of Results


1.Descriptive Statistics.
1.1 Descriptive statistics
First, we import datasheet to have the data
We use attach(mathscore) to assign an object from the datasheet
to R’s search path.
Using library(ggpubr) to import the library.
And now, we use summary(mathscore) to know the min, median,
mean, max . (Summary is a generic function used to produce
result summaries of the results of various model fitting functions.)
We see that Group 4 has the lowest score(min), and the median
values are approximately equal and the mean values are the same.
The greatest math score belongs to Group 2.

By using sd(mathscore$`Group …`,na.rm = TRUE. This function


computes the standard deviation of the values in x. If na.rm is
TRUE then missing values are removed before computation
proceeds.
This is the results, we see that the standard deviation of Group 4
is highest, and that of Group 1 is lowest, meaning that Group 4
scores spread out the widest, whereas Group 1 scores are tight.

Next, we plot the graphical numerical variables by using the


command hist(mathscore$`Group …`, main = 'Math score of
Group … for enrolling to DUT', xlab = 'Math score', ylab
='number of students')

The generic function hist computes a histogram of the given data


values
This is the plot of 4 groups:
Now,We use boxplot to plot the given values.
The results are in the image below, We can see the median of
group 4 is lower than the other 3 groups
We can clearly see that group 2 scores spread out the widest,
whereas group 1 scores are tight.

Besides,we use ggqqplot(mathscore$ `Group …`,ylab="students")


to have Quantile-Quantile plot of 4 groups.

Group 1 Group 2
Group 3 Group 4

These dots are distributed in the grey line, showing these data are normal
distributed.

Moreover, we observe total data


We use summary(mathscore) to know the min, median, mean,
max (Summary is a generic function used to produce result
summaries of the results of various model fitting functions.)
By using sd(x). This function computes the standard deviation of
the values in x.
Furthermore, we use dnorm(x) as it is standard density and
qqnorm as it is a generic function the default method of which
produces a normal QQ plot of the values in y
Now, It is the results, we see that the top of the graph is
approximately equal to 8.0-8.3:

The plot demonstrates a nearly-perfect curve of normal


distribution.

1.2 Inferential Statistic


1.2.1. Confident interval:
It is the same as 1.1 part, we know what a summary is and how to
use it.

It is more than the first one, It has standard deviation of the total

Next, we calculate 95% confidence interval by assign value to


find the left and the right
By using this code, we can calculate group 2 , group 3 , group 4
and total like group 1.
And the results:

1.2.2. Hypotheses and Test Procedures:


We will get Hypotheses and Test Procedures in mathscore of 4
groups: n=155,
Step 1: Identify parameter of interest : μ- mean of math score, x́ -
mean of total mathscore
μ = -7.8
Step 2: Null hypothesis H0 μ = 7.8, alternative hypothesis Hα
Step 3: Because n = 155 so we can use z-distribution with
significance level alpha 0.05
Step 4: We apply the formula of standardized variable to calculate
x́−μ 8.083548−7.8
Z = s / √ n = 0.5738373/ √155
Step 5: Choose alpha = 0.05
This is an upper-tailed test.
This z value is larger than Z-alpha value (2.33) -> We can reject
H0
Step 6: Conclusion:
True average math score of the whole DUT is larger than 7.8,
with significant level being 0.05

You might also like