Professional Documents
Culture Documents
Standard Deviation: describes how much the data points in a sample or population differ
from one another.
(Sample var) (Population var) obs : sd is just the square root of the
variance
Question 1
d) Standard Error: describes how unsure we are about a measure/parameter one
is trying to estimate. (ex: population mean)
f) type I error: also known as false positive rate, which means that one concludes
there is an interesting difference when in fact there isn't any (Reject the null
hypothesis when it should not).
a) Median: Splits the data into 50% lowest and 50% highest.
1st quantile: Splits the data into 25% lowest and 75% highest. (In this case, 1st quantile
is 8.)
3rd quantile: Splits the data into 75% lowest and 25% highest. (In this case, 3rd
quantile is 18.)
Unpaired 2-sample because we have 2 samples for each frequency (male and
female) and they are not related.
Question 4
a) H0: all auxiliaries have same frequency. (f-hebben = f-zijn = f-zijnheb)
b)
(212 - 95)^2/95 + (15 - 95)^2/95 + (58- 95)^2/95
-> Calculating expected number of observations for regular verbs with auxiliary
zijn:
Recap:
dchisq(x, df, ncp = 0, log = FALSE) # density function, to see what the χ2
distributions look like, we use the dchisq().
dchisq() -> knowing the rejection value from pchisq for a certain alpha-value,
decide whether Chi Square value within or outside the rejection region.
Question 4
For 0.54 and 4.13 -> outside rejection region, we do not reject the null hypothesis.
For 8.23 -> within the rejection region, reject the null hypothesis.
Question 5
a) Alpha = 0.10 - two-tailed (= 0.05 each side)
b) For alpha = 0.05 and 0.10, the value of 1.62 would not reject the null
hypothesis. 0.05 (rejection at >2/<-2 and) 0.10 (rejection at >1.7/<-1.7
Question 6
a) - Homogeneity of Variance:
Levene Test: Test whether the groups have the same variance, we can use
the Levene test.
H0: no indication that homogeneity of variance assumption is violated. (p>
0.05)
H1: homogeneity of variance is violated. (p < 0.05)
- Normality of Residuals:
Shapiro-Wilk test : test whether data at hand is normally distributed (Bell
shape)
H0: no indication that normality is violated. (p> 0.05)
H1: normality is violated. (p < 0.05)
Or QQ-plot
Question 6
b)
c)
As p <0.05, we can tell that there is a difference between the mean of the groups.
Question 6
d) Run a pairwise t.test() in R
e) EtaSquared measures the Effect Size of ANOVA, just as Cohens D for t-tests. It
refers to the proportion of the variability in the outcome variable that can be
explained in terms of the predictor.
Question 7
a) Pearson correlation ~ 0.61, moderate positive correlation, however p-value >
0.05 which means that we retain the H0 (Pearson correlation = 0, there is no
correlation between grade and hours slept.)
b) If the experiments were repeated, one would expect the value to lay on that
interval (-0.035 and 0.89).
c) Intercept means the minimum grade the student will get without any sleep.
Slope how grade increase at each unit increase of hours slept.
d) Residuals is the error in the prediction of yi by yi-hat.
e) Second model is better given that R-Squared and
Adjusted R-Squared are greater than first model.
Question 8
a) Both tests compare the mean of two groups/independent samples. Welch test
is used when the Homoscedasticity assumption does not hold (population
standard deviation is not the same in both groups).
b) Chi-Square is a sum of squared Binomial Distributions; each Binomial
distribution is approx. Normal when n*p and n*q > 5 meaning that the
frequencies should be big enough for the approximation to hold.
If this does not hold use Fisher’s Exact Test.
Note that the observed values don't have to be larger than 5, but the expected
values need to be larger than 5 in each cell.
c) Non-parametric test does not rely on any assumption about the distribution of
the response variable.
Ex. Fisher’s Exact Test, Wilcoxon Test, Mann-Whitney.