Professional Documents
Culture Documents
3 BE-LEC 100 - Lesson 09
3 BE-LEC 100 - Lesson 09
4. Compare p value
P VALUE
• Probability value
• Probability of getting the observed value of the test statistic, or a value with even greater evidence against
H0, if the null hypothesis is actually true
• Rule:
• Z Test
• T test
• ANOVA
• Chi-square
• Pearson Correlation
• Spearman Correlation
• Regression
• Mann-Whitney test
Z TEST
T TEST
• Bell-shaped
• Mean, median and mode are equal to 0 and are located at the center of the distribution
• It is a family of curves based on the degree of freedom, which is a number related to sample size
• As the sample size _________, the t distribution approaches the normal distribution
• THREE VERSIONS
• Paired sample t-test which compares means from the same group at different times
• One sample t-test which tests which tests the mean of a single group against a known mean
• PROBLEM STATEMENT
• According to the CDC, the mean height of adults ages 20 and older is about 66.5 inches (69.3
inches for males, 63.8 inches for females). Let's test if the mean height of our sample data is
significantly different than 66.5 inches using a one-sample t test.
• H0: _________
H1: _________
• where 66.5 is the CDC's estimate of average height for adults, and xHeight is the mean height of the
sample.
A. TEST VALUE
B. T STATISTIC: calculated by dividing the mean difference (E) by the standard error mean
E. MEAN DIFFERENCE: difference between the observed sample mean and the expected mean
CONCLUSION:
There is a significant difference in the mean height between the sample and the overall adult population
(p < .05)
The average height of the sample is about 1.5 inches taller than the overall adult population average.
INDEPENDENT T TEST
• No significant outliers
• USE:
• PROBLEM STATEMENT
• In our sample dataset, students reported their typical time to run a mile, and whether or not they
were an athlete. Suppose we want to know if the average time to run a mile is different for athletes
versus non-athletes. This involves testing whether the sample means for mile time among athletes
and non-athletes in your sample are statistically different
• This tells us that we should look at the "Equal variances not assumed" row for the t test (and corresponding
confidence interval) results
CONCLUSION:
There was a significant difference in mean mile time between non-athletes and athletes
The average mile time for athletes was 2 minutes and 14 seconds faster than the average mile time for
non-athletes.
• USE:
• It is used to compare whether two groups containing different people are the same or not.
• Ranks all of the data and then compares the sum of the ranks for each group to determine whether
the groups are the same or not.
• PROBLEM STATEMENT
• Data contain the ratings of 3 car commercials by 18 respondents, balanced over gender and age
category. Our research question is whether men and women judge our commercials similarly.
• NULL HYPOTHESIS:
• Analyze > Nonparametric tests > Legacy Dialogs > 2 independent samples
• _________ method:
• means that p values are estimated based on the assumption that the data, given a sufficiently large
sample size, conform to a particular distribution
• _________ method:
• Although exact results are always reliable, some data sets are too large for the exact p value to be
calculated, yet don’t meet the assumptions necessary for the asymptotic method
• provides an unbiased estimate of the exact p value, without the requirements of the asymptotic
method
• _________:
CONCLUSION:
Women rated the “Family Car” commercial more favorably than men (p = 0.001). The other two
commercials didn't show a gender difference (p > 0.10).
PAIRED T TEST
• Assesses whether the mean difference between paired observations on a particular outcome is significantly
different from zero
• USE:
• Compare how a group of subjects perform under two different test conditions
• PROBLEM STATEMENT
• The sample dataset has placement test scores (out of 100 points) for four subject areas: English,
Reading, Math, and Writing. Suppose we are particularly interested in the English and Math
sections and want to determine whether English or Math had higher test scores on average. We
could use a paired t test to test if there was a significant difference in the average of the two tests.
CONCLUSION:
English and Math scores were weakly and positively correlated (r = 0.243, p < 0.001).
There was a significant average difference between English and Math scores (t397 = 36.313, p < 0.001).
On average, English scores were 17.3 points higher than Math scores (95% CI [16.36, 18.23]).
• USE:
• Compare two related samples, matched samples or repeated measurements on a single sample to
assess whether their population mean rank differ
• PROBLEM STATEMENT
• A car manufacturer had 18 respondents rate 3 different commercials for one of their cars. They first
want to know which commercial is rated best by all respondents.
• Analyze > Nonparametric test > Legacy Dialogs > 2 Related samples
CONCLUSION:
“A Wilcoxon Signed-Ranks test indicated that the “Family car” commercial (mean rank = 10.6) was rated
more favorably than the “Youngster car” commercial (mean rank = 4.0), Z = -3.2, p = 0.001.”
• is a procedure for testing if two categorical variables are related in some population
• USE:
• The null hypothesis is that there is no relationship/association between the two categorical
variables
• Compares expected frequencies, assuming the null is true, with the observed frequencies from the
study
• ASSUMPTIONS:
• For a larger table, all expected frequencies > 1 and no more than 20% of all cells may
have expected frequencies < 5
• PROBLEM STATEMENT
• A sample of 183 students evaluated some course. Apart from their evaluations, we also have their
genders and study majors. We'd now like to know: is study major associated with gender? And -if
so- how?
CONCLUSION:
We reject the null hypothesis that our variables are independent in the entire population.
• ADJUSTING TABLE
• Pivoting trays > Drag and drop Statistics right underneath “What’s your gender?”