Rminiproject Report Team 4

DANANG UNIVERSITY
OF SCIENCE AND TECHNOLOGY
LABORATORY REPORT
Mini Projects:
Statistics and Probability
INSTRUCTOR: Prof. Nguyễn Chánh Tú

STUDENTS: Hứa Thị Bình Nguyên
Hồ Hoàng Hảo
Lê Gia Quang
Bùi Minh Hiệp
Trương Công Thắng
Nguyễn Văn Hậu
Da Nang, 1/2021
I. Introduction to problems
We investigate math scores of students enrolling in Da Nang University of
Technology (short: DUT) .So we collect data from 4 groups in FAST, analyze
these data and then conclude the math score of the whole DUT.
Parameter of interest: population mean of Math score m
Initial conjecture: Is the mean μ>7.8 ?
Member name and the tasks:
● Ho Hoang Hao (Leader)
● Hua Thi Binh Nguyen (Descriptive statistics analyst)
● Truong Cong Thang ( Descriptive statistics analyst)
● Bui Minh Hiep (Inferential statistics analyst)
● Le Gia Quang (Descriptive statistics analyst)
● Nguyen Van Hau(Hypothesis tester)
II. Data Collection ways (in case you do not use the

provided data):
We conduct surveys using google form and quiz to receive
samples to create tables.
We found sampling and some non sampling errol, they are

questioning error, cheating error, unwillingness error, selection
error
III. Analysis of Results

1.Descriptive Statistics.
1.1 Descriptive statistics
First, we import datasheet to have the data
We use attach(mathscore) to assign an object from the datasheet
to R’s search path.
Using library(ggpubr) to import the library.
And now, we use summary(mathscore) to know the min, median,
mean, max . (Summary is a generic function used to produce
result summaries of the results of various model fitting functions.)
We see that Group 4 has the lowest score(min), and the median
values are approximately equal and the mean values are the same.
The greatest math score belongs to Group 2.
By using sd(mathscore$`Group …`,na.rm = TRUE. This function

computes the standard deviation of the values in x. If na.rm is
TRUE then missing values are removed before computation
proceeds.
This is the results, we see that the standard deviation of Group 4
is highest, and that of Group 1 is lowest, meaning that Group 4
scores spread out the widest, whereas Group 1 scores are tight.
Next, we plot the graphical numerical variables by using the

command hist(mathscore$`Group …`, main = 'Math score of
Group … for enrolling to DUT', xlab = 'Math score', ylab
='number of students')
The generic function hist computes a histogram of the given data

values
This is the plot of 4 groups:
Now,We use boxplot to plot the given values.
The results are in the image below, We can see the median of
group 4 is lower than the other 3 groups
We can clearly see that group 2 scores spread out the widest,
whereas group 1 scores are tight.
Besides,we use ggqqplot(mathscore$ `Group …`,ylab="students")

to have Quantile-Quantile plot of 4 groups.
Group 1 Group 2
Group 3 Group 4
These dots are distributed in the grey line, showing these data are normal
distributed.
Moreover, we observe total data

We use summary(mathscore) to know the min, median, mean,
max (Summary is a generic function used to produce result
summaries of the results of various model fitting functions.)
By using sd(x). This function computes the standard deviation of
the values in x.
Furthermore, we use dnorm(x) as it is standard density and
qqnorm as it is a generic function the default method of which
produces a normal QQ plot of the values in y
Now, It is the results, we see that the top of the graph is
approximately equal to 8.0-8.3:
The plot demonstrates a nearly-perfect curve of normal

distribution.
1.2 Inferential Statistic

1.2.1. Confident interval:
It is the same as 1.1 part, we know what a summary is and how to
use it.
It is more than the first one, It has standard deviation of the total
Next, we calculate 95% confidence interval by assign value to

find the left and the right
By using this code, we can calculate group 2 , group 3 , group 4
and total like group 1.
And the results:
1.2.2. Hypotheses and Test Procedures:

We will get Hypotheses and Test Procedures in mathscore of 4
groups: n=155,
Step 1: Identify parameter of interest : μ- mean of math score, x́ -
mean of total mathscore
μ = -7.8
Step 2: Null hypothesis H0 μ = 7.8, alternative hypothesis Hα
Step 3: Because n = 155 so we can use z-distribution with
significance level alpha 0.05
Step 4: We apply the formula of standardized variable to calculate
x́−μ 8.083548−7.8
Z = s / √ n = 0.5738373/ √155
Step 5: Choose alpha = 0.05
This is an upper-tailed test.
This z value is larger than Z-alpha value (2.33) -> We can reject
H0
Step 6: Conclusion:
True average math score of the whole DUT is larger than 7.8,
with significant level being 0.05

Rminiproject Report Team 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rminiproject Report Team 4

Uploaded by

Copyright:

Available Formats

DANANG UNIVERSITY

OF SCIENCE AND TECHNOLOGY

INSTRUCTOR: Prof. Nguyễn Chánh Tú

II. Data Collection ways (in case you do not use the

We found sampling and some non sampling errol, they are

III. Analysis of Results

By using sd(mathscore$`Group …`,na.rm = TRUE. This function

Next, we plot the graphical numerical variables by using the

The generic function hist computes a histogram of the given data

Besides,we use ggqqplot(mathscore$ `Group …`,ylab="students")

Moreover, we observe total data

The plot demonstrates a nearly-perfect curve of normal

1.2 Inferential Statistic

Next, we calculate 95% confidence interval by assign value to

1.2.2. Hypotheses and Test Procedures:

You might also like