Professional Documents
Culture Documents
Statistics
2022-2023
Group members:
AN Bronith ID: e20190006
CHEA Puthpisakh ID: e20200343
CHHUON Ratanakvatey ID: e20200131
HAM Chetra ID: e20200546
HOR Songhak ID: e20200010
Lecturers:
Mr. PHOK Ponna (Course)
Mr. TOUCH Sopheak (TD)
1 Introduction
In this project, we use complete case analysis in the dataset which the interested variables
are manufacturer, model and cty. There will be summary statistics table, present the
number of cars for each sub-brand and its average, variance, standard deviation, minimum
and maximum of city mile per gallon. The aim of this project is to analyze fuel consumption
in the city of cars the same brand.
library(ggplot2)
data(mpg)
2. Two boxplots for the city mile per gallon for sub-brand of “Audi” and “Volkswagen”.
1
2.4 Statistical Software
R Programming will be used for statistical analysis.
3 Results
3.1 Data Exploration
3.1.1 Summary statistics table
There were 3 sub-brand of Audi car. A4 and A4 Quattro had similar outcome except A6
Quattro that had too little amounts (3), A4 Quattro had the highest amount (8). In city
mile per gallon with the minimum of 15 and the maximum of 21. From Table 1 however,
in average A4 was higher (18.9), variance (3.48) and standard derivation (1.86).
There were 4 sub-brand of Volkswagen car. The dataset had the following structures.
From Table 2, Jetta had the highest amount (9) and GTI had the least (5). In city
mile per gallon, the minimum was 16 (Jetta and Passat) and the maximum was 35 (New
beetle). Similarly, in average Passat was the lowest (18.6) and New Beetle was the highest
(24). however, Variance of Jetta and New Beetle were extremely higher than it’s standard
deviation (23.7 and 24 vs. 4.87 and 6.51 respectively). Therefore, it indicates that the data
points are very spread out from the mean, and from one another.
2
3.1.2 Graphs
Figure 1: Histogram of the distribution of the city mile per gallon for sub-brand of “Audi”.
Figure 1 shows the distribution of the variable rate which is the rate of the city mile
per gallon for sub-brand Audi A4 (blue) and Audi A4 Quattro (red). In this graph, a
histogram of 15 rates for 2 sub-brand is shown. It is quite clear that the histogram suggests
the presence of some heterogeneity as not all sub-brand seem to have the same rate for
reaching the target. This indicates that the group of sub-brand may be clustered with
respect to their rate of reaching the target. Therefore, the finite mixture model is needed.
Figure 2: Boxplots of the city mile per gallon for sub-brand of “Audi”.
Figure 2 indicates the range in which the middle 50% of all values lie. It shows that city
mile per gallon of Audi A4 Quattro is normally distributed and Audi A4 has right skewed
3
distribution. There are also no outlier in both dataset.
Figure 3: Boxplots of the city mile per gallon for sub-brand of “Volkswagen”.
Figure 3 indicates the range in which the middle 50% of all values lie. It shows that
city mile per gallon of sub-brand Volkswagen Passat and Volkswagen New Beetle is right
skewed distributed, Volkswagen Jetta and Volkswagen GTI have left skewed distribution.
It’s worth to point out that sub-brand Jetta and GTI has the same median. However, Jetta
has one outlier (around 33) that is needed to address before further calculation.
Table 3: ANOVA of city mile per gallon for “Audi A4” and “Audi A4 Quattro”.
Test H0 : µ1 = µ2 vs. Ha : µ1 ̸= µ2
From table 3
P-value = 0.091 > α = 0.05
Then H0 is not rejected.
Hence, in the city “Audi A4” consume fuel as equal as “Audi A4 Quattro”.
4
Table 4: ANOVA of city mile per gallon for sub-brand of “Volkswagen”.
4 Conclusion
By using Analysis of variance (ANOVA) test, we simply test H0 for the same mean of city
mile per gallon for each sub-brand against Ha for at least two are different. We chose 5%
of statistical significance (α). Finally we can conclude that in theses two car brand (Audi
and Volkswagen), all their sub-brands consume fuel equally in the city. However, at the
same time we don’t have enough evidence to conclude that it’s equal too.
5
References
[1] Kim, H.Y., 2014. Analysis of variance (ANOVA) comparing means of more than two
groups. Restorative dentistry & endodontics, 39(1), pp.74-77.
[2] Alam, M. (2021) Reading and interpreting summary statistics, Medium. Towards Data
Science. Available at: towardsdatascience.com (Accessed: January 22, 2023).
[3] Test, Chi-square, ANOVA, regression, correlation... Available at: datatab.net (Ac-
cessed: January 22, 2023).
Appendix
Listing 1: R code
library(ggplot2) #for dataset and plotting
library(magrittr) #for pipe operator
library(dplyr) #for group_by
data(mpg) #dataset
view(mpg) #to see the dataset
names(mpg) #to view the variable
mpg_no_na <- na.omit(mpg) #to remove missing data
#Statistic table
audi %>%
group_by(model) %>%
summarise(Amount = n(), Average = mean(cty), Var = var(cty), Std = sd(cty),
Min. = min(cty), Max = max(cty))
volkswagen %>%
group_by(model) %>%
summarise(Amount = n(), Average = mean(cty), Var = var(cty), Std = sd(cty),
Min. = min(cty), Max = max(cty))
6
###select only a4 and a4 quattro
ad_no_a6 <- rbind(subset(audi, model == "a4"),
subset(audi, model == "a4␣quattro"))
#histogram
ggplot(ad_no_a6, aes(cty,fill = model)) +
geom_histogram(binwidth = 1,color = "black",
alpha = 0.8, position = "identity")+
scale_fill_manual(values = c("blue", "red"))+
ggtitle("Histogram␣of␣the␣city␣mile␣per␣gallon␣for␣sub-brand␣of␣"Audi"")+
labs(x = "City␣miles␣per␣gallon", y = "Frequency")+
theme_classic()
ggsave("ad_hist.png", width = 6, height = 5, dpi = 300) #save
#boxplot
##audi
ggplot(ad_no_a6, aes(cty, model, fill = model)) +
geom_boxplot(show.legend = FALSE)+
ggtitle("Boxplots␣of␣the␣city␣mile␣per␣gallon␣for␣sub-brand␣of␣"Audi"")+
labs(x = "City␣miles␣per␣gallon", y = "Audi’s␣Model")+
theme_classic()
ggsave("ad_boxplot.png", width = 7, height = 2, dpi = 300) #save
##volkswagen
ggplot(volkswagen, aes(cty, model, fill = model)) +
geom_boxplot(show.legend = FALSE)+
ggtitle("Boxplots␣of␣the␣city␣mile␣per␣gallon␣for␣sub-brand␣of␣"Volkswagen"")+
labs(x = "City␣miles␣per␣gallon", y = "Volkswagen’s␣Model")+
theme_classic()
ggsave("vol_boxplot.png", width = 7, height = 3.5, dpi = 300) #save
#anova test
ad_ano <- aov(cty ~ model, data = ad_no_a6)
summary(ad_ano)
The End