Professional Documents
Culture Documents
Group: 7
Member:
{Nguyễn Thái Nguyên- QS170069; Lương Hoàng Duy - DE170114; Trần Đinh Khang -
I. Dataset:
The dataset given by teacher includes the weight and height of 30 female students. Here is the
The dataset only contains two variables weights and heights of 30 female students. In this project
we will conduct descriptive and inferential statistical analysis based on the given data and then a
regression model is introduced to show the mathematical relation between the two variables.
1. Descriptive Analysis:
By using the Analysis tool provided by Microsoft Excel, a descriptive analysis table were
generated.
Weight Height
Mean 59,73333 Mean 165,6667
Standard Error 1,231701 Standard Error 1,132928
Median 59,5 Median 166
Mode 58 Mode 170
Standard Deviation 6,746306 Standard Deviation 6,2053
Sample Variance 45,51264 Sample Variance 38,50575
Kurtosis 0,524973 Kurtosis -0,12615
Skewness 0,837299 Skewness -0,10425
Range 27 Range 25
Minimum 50 Minimum 154
Maximum 77 Maximum 179
Sum 1792 Sum 4970
Count 30 Count 30
Confidence Level (95.0%) 2,519112 Confidence Level (95.0%) 2,317097
The average weight of the students was found to be approximately 59.73 kg, with a standard
deviation of 6.75 kg, indicating a moderate amount of variability in the weights. The mode is 58
kg, indicating that it is the weight that appears most often in the dataset.
The skewness value of 0.84 indicates that the weight distribution is positively skewed. This
means that the tail of the distribution extends more towards higher weights, suggesting that there
might be a few students with relatively higher weights compared to the majority.
The range of weights observed in the dataset was 27 kg, ranging from a minimum of 50 kg to a
maximum of 77 kg. The most frequently occurring weight was 58 kg. The sample variance, a
The majority of students have weights ranging from 55.4 kg to 71.6 kg, with the highest
frequency occurring within the 55.4 kg to 60.8 kg range. The distribution appears to be slightly
skewed towards relatively lower weights, as indicated by the lower frequencies in the higher
weight ranges.
The average height of the students was calculated to be approximately 165.67 cm, with a
standard deviation of 6.21 cm. This suggests that the heights of the students varied around the
mean, indicating a moderate level of diversity within the group. The height distribution exhibited
The analysis revealed that the range of heights in the dataset spanned 25 cm, with the shortest
height recorded at 154 cm and the tallest at 179 cm. The most common height among the
students was 170 cm, reflecting a frequently occurring value within the group. The sample
Frequency
6 60.00% Frequency
Cumulative %
4 40.00%
2 20.00%
0 0.00%
154 159 164 169 174 More
Bin
The majority of individuals have heights falling within the range of 164 cm to 169
cm, which is the most prevalent range. The distribution appears to be slightly
2. Inferential Statistics:
Analysis: Construct a Confidence Interval with 5% significance level for average height of all
So after the analysis, we can conclude that a 95% confidence interval on the average height of all
Research question for Hypothesis Testing: Average Height of all female students is 164 cm.
Test the claim with significance level of 10% based on the data.
Hypotheses:
Null Hypothesis (H0): The average height of all female students is 164 cm.
Alternative Hypothesis (H1): The average height of all female students is not 164 cm.
alpha 10%
n 30
sample mean x 165,67
sample stdev s 6,21
Test statistic
mean0 164
1,47111
t0 5
t(alpha/2,n-1) 1,70
-t(alpha/2,n-1) -1,70
Because t0 is in acceptance range, fail to reject null hypothesis. This indicates that there is not
enough evidence to suggest that the average height of all female student is not equal to 164 cm.
Analysis: Test the claim that the percentage of female students with weight under 65 in the
Null Hypothesis (H0): The percentage of female students with weight under 65 in the world is
equal to 60%.
Alternative Hypothesis (H1): The percentage of female students with weight under 65 in the
p hat 0,80
test static z0 2,24
z(alpha/2) = right critical 1,96
- z(alpha/2) = left critical -1,96
Z0 is not in acceptance, so we reject the null hypothesis (H0), which means this would indicate
evidence to suggest that the percentage of female students with weight under 65 in the world is
For Linear Regression analysis, we set the variable height(cm) as “X” and variable weight(kg) as
“Y”. Using Regression Analysis from Analysis Tool in Excel, a summary output was generated:
Regression Statistics
Multiple R 0,875052
R Square 0,765716
Adjusted R Square 0,757349
Standard Error 3,323203
Observations 30
The multiple correlation coefficient (R) indicates the strength and direction of the linear
relationship between the predictor variables and the response variable. In this case, the multiple
R value is approximately 0.88. This suggests a strong positive correlation between the predictor
The R Square value is approximately 0.77, indicating that around 77% of the variability in the
response variable can be explained by the predictor variables included in the regression model.
The adjusted R Square takes into account the number of predictor variables and the sample size
to provide a more accurate measure of the proportion of variance explained. The adjusted R
Square value of approximately 0.76 suggests that the predictor variables explain about 76% of
the variance in the response variable, considering the model's complexity and sample size.
The standard error represents the average deviation of the observed values from the regression
line. In this case, the standard error is approximately 3.32. It provides an estimate of the typical
distance between the actual data points and the predicted values from the regression model.
The number of observations indicates the sample size used in the regression analysis. In this
ANOVA table:
The analysis conducted involved performing an ANOVA (analysis of variance) to assess the
significance of a regression model. The results showed that the regression model was highly
significant (p < 0.001), indicating that the predictor variable(s) included in the model have a
Further examination of the coefficients revealed that the intercept had a significant negative
effect, with a value of approximately -97.87. The predictor variable (referred to as X Variable 1)
had a significant positive effect, with a coefficient of approximately 0.95. These coefficients
imply that for every unit increase in X Variable 1, the response variable is expected to increase
For the given regression analysis above, we can conduct a formula calculating Y based on X and
Example: Given a student in the class with her height is 170cm, predict her weight?
correctly predict the weights given their heights. The output table is given below.
20
0
150 155 160 165 170 175 180 185
X Variable 1
As we can see, the regression model can estimate the value quite correct. This means that the
regression model can estimate well the relation between weights and heights of 30 examined
female students.
IV. Conclusion:
In conclusion, this data analysis project focused on examining the heights and weights of 30
female students. The project utilized descriptive statistics to summarize and analyze the data,
inferential statistics to draw conclusions and make predictions, and visual representations to
The descriptive analysis revealed that the average height of the female students was
approximately 165.67 cm, with a standard deviation of 6.21 cm, indicating a moderate level of
variability. The weight data showed an average weight of approximately 59.73 kg, with a
standard deviation of 6.75 kg. Both height and weight distributions exhibited near-normal
shapes.
Inferential statistics were employed to explore relationships and determine the statistical
significance of certain variables. The results indicated strong positive correlations between
Visual representations, including graphs, tables, and frequency distributions, were created to
enhance the communication of the findings. These visuals aided in effectively presenting the
Overall, this project successfully analyzed the heights and weights of the female students,
providing valuable insights into their distributions, relationships, and characteristics. The