You are on page 1of 23

PSYC 2060

Research and Quantitative Methods in


Psychology

Session 2

1
Representation of distribution (examples)
A finite number of observations can be summarized and represented by a
frequency distribution.
Bar graph

Frequency distribution table


Memory score (%) Frequency
0 - 20 1
21 - 40 3
41 - 60 20
61 - 80 51
81 -100 28

Histogram

2
Source: IBM SPSS
Probability distribution
A sizeable/infinite population of observations is often summarized
and represented by a probability distribution, i.e., in terms of the
probability or relative frequency of occurrence of a variable’s
values.
Examples

3
Probability distribution
• The probability that a continuous variable (e.g., reaction time,
intelligence) is exactly of a particular value is zero
• The probability distribution of a continuous random variable can be
formally described by a probability density function (PDF), which
indicates the probability of the variable being close to a specified value
• The histogram of a sample of data from a population can be taken as
an approximation of the population PDF curve.

PDF curve

4
Probability distribution
The probability that a continuous random variable
is between two specified values is equal to the
area under the PDF curve over that interval.

The shared area


represents the The shaded area
probability of X being of represents the scores
value between A and B of Y between C and D

X Y
A B C D

5
Normal distribution
• Many naturally occurring variables are close to being
normally distributed
• The probability density of a normal distribution is precisely
defined by a formula
• A bell-shaped probability density curve does not
necessarily represent a normal distribution.
(Cumming & Calin-
Jageman) standard normal distribution

6
Standardized normal distribution

.6915 .8790

P( -ꝏ < z < .50) P( -ꝏ < z < 1.17)

7
Example
What is the probability that a
standardized random
variable is greater than 1.48 A
(i.e., the area B under the 0.0694
standard normal distribution B

curve)?
1.48

From the standard normal distribution table,


the probability of -ꝏ < z < 1.48 = the area A under the curve =
.9306

Given that the total area of A and B is equal to 1 (as A and B


jointly cover all values of the variable), the required probability
= the area B under the curve = 1 - .9306 = .0694
8
Standardized normal distribution

P(z < .8) = .7881

P(z < -1) = . 1587

The probability of z being


between -1 and 0.8

= .7881 - .1587 =.6294

9
Random sampling
• Population: the entire collection of units/observations to
which a research’s conclusions are intended to apply

• Sample: a subset of the units/observations of a population


selected to represent the population for a research

• Random sampling: (i) each unit/observation of the population


has an equal probability of being included in the sample and
(ii) the selection of one unit/observation is independent of
the selection of every other unit/observation

• True random sampling is rarely feasible in practice


(particularly for large populations). “We usually take samples
of convenience … and hope that their results reflect what we
would have obtained in a truly random sample” (Howell).
Sampling error
When we randomly draw samples, the value of a sample’s
statistic (e.g., sample mean) may deviate from the population
value (e.g., population mean) as a result of the particular units/
observations that are included in the sample. This sampling error
does not imply carelessness or any mistake.

(source: Dancey et al.)

11
Sampling distribution
The distribution of all possible outcomes of a
sampling design
Example:
Expected proportion of random samples
(N = 5) from an adult population with
equal proportions of men and women

0 man 1 man 2 men 3 men 4 men 5 men


5 women 4 women 3 women 2 women 1 woman 0 woman

12
Sampling distribution
• If we obtain the values of a sample statistic (e.g., sample mean) from a large
number of random samples (of the same sample size) independently drawn
from a population, the distribution of the obtained values will be an
approximation of the statistic’s sampling distribution
• Conceptually, the sampling distribution is the distribution expected for that
statistic if we drew an infinite number of random samples (of the same sample
size) from the population and calculated the statistic on each sample (Howell).

Sample mean roll on a die


(source: Dancey et al.) 13
Sampling distribution of the mean
(a.k.a. sampling distribution of sample means)

Based on the central limit theorem,


• if a population’s scores are normally distributed, the sampling distribution of
the mean will be exactly a normal distribution regardless of the sample size (N);
• if the population scores are not normally distributed, as N increases, the
sampling distribution of the mean approaches a normal distribution, whatever
the shape of the population’s distribution.

Normally distributed

(Rouaud)

14
Sampling distribution of the mean
• Central limit theorem (CLT)

(Howell)

• Standard error is the SD of the sampling distribution.


Simulated sampling distribution (500 samples) of the mean (N = 9) Simulated sampling distribution (500 samples) of the mean (N = 25)

(Warner)

15
Hypothesis testing (example)
A decision-making process in which hypotheses are evaluated statistically.
Example:
• The academic self-efficacy (hereafter referred to as self-efficacy) of all university students
in a region (the population) as measured by a well-established instrument: mean
score = 38.76, SD = 6.31 (a higher score for higher self-efficacy)
• The mean self-efficacy score of a sample of 52 students who went through a
mindfulness-based intervention was 40.58. The SD of the mindfulness-trained
population’s scores is assumed to be 6.31 (same as the main population’s SD)
• Both populations’ scores are assumed to be normally distributed.
Research question: Whether self-efficacy scores differ across the intervention
participants and non-participants (main population)?
Main (non- Intervention
Research hypothesis (H1 ): The intervention participant)
population
participant
population
makes a difference in self-efficacy

Null hypothesis (Ho ): The research No self-efficacy difference


between intervention
hypothesis is not true; the observed participants and non-
participants
difference (40.58 – 38.76) arises from
sampling errors only. 38.76 16
Hypothesis testing (example)
If Ho is true, i.e., if the intervention participant population has a mean of 38.76
and a SD of 6.31, how likely would a sample mean as far off as 40.85 be obtained?
 If the sample-population difference would have reasonably likely occurred, Ho
(the sampling-error explanation) is not rejected
 If the sample-population difference would have very unlikely occurred (say,
probability < .05), it is reasonable to reject Ho, i.e., to conclude that H1 is
supported by the data.
Population Sampling
distribution distribution
The sampling
distribution is
normally distributed
 = 6.31 with a mean of 38.76,
if the null hypothesis
is true

Population mean = 38.76 Observed sample mean = 40.58


(If the null hypothesis is true)
NB: this diagram is not to scale
17
Hypothesis testing (example)
• According to the central limit Sampling
theorem and the distribution

aforementioned assumptions,
Standard error
the expected mean of the (SE) = 6.31/ SQRT (52)
sampling distribution = 38.76; = 0.875

the standard error (SE) =


6.31/SQRT (52) = 0.875
Mean = 38.76
The sample mean is
• The observed difference (if H is true)
o
located at 1.82 (or 2.080
(sample mean - expected SE) away from the mean
mean) = 40.58 - 38.76 = 1.82
• If the Ho is true, what is the probability of an observed
difference as large as 1.82?
• The z score of the sample mean (in the sampling distribution
of the mean) = 1.82/0.875 = 2.080 18
Hypothesis testing (example)
• Assuming that the sample means are normally distributed, only 5 times
out of 100 a z-value of more than +1.96 or less than -1.96 will be expected

• Since a z-value of 2.08 (as observed) is beyond the range of ±1.96, the
occurrence probability of such a z-score is below 5%
• If this probability criterion (< 5%) was preset for rejecting the Ho, the Ho
can be rejected, i.e., the statement that the intervention participant
population has a mean self-efficacy score of 38.76 can be rejected
• Interpretation: The mean self-efficacy score of students after the
intervention is significantly greater than the main (non-participant)
population’s mean. 19
Statistical significance
• When a study’s results are statistically significant, it means that if the null
hypothesis is true, it is unlikely that the sample results would have turned out
• The criterion for statistical significance or the alpha level (e.g., 5%) should be
preset
• Critical value of the test statistic (e.g., z or t): the value that separates the region
of rejection and the region of non-significance in the sampling distribution
• p value: the probability of obtaining the test statistic value observed or a more
extreme value if the null hypothesis is true.

Sampling distribution of the test


statistic if the null hypothesis is true
(Area = 1 - alpha)

Probability
density

(source:
Hatcher)

Critical value 20
p value (referring to the example)
• An example of computer output for the mindfulness intervention case (the R
command is not in the scope of this course)

• This p value corresponds to the probability of observing the test statistic value
(2.08) or a more extreme value if the null hypothesis is true.

NOTE: In this course,


students are not required
to obtain the p value of a
test statistic by hand-
calculation or from a
statistical table

21
Statistical significance
• If the p value of the sample’s test statistic is no more than alpha (), i.e., the test
statistic is in the region of rejection:
 the null hypothesis is rejected
 the results are statistically significant; the research hypothesis is supported
• If the p value of the sample’s test statistic is more than alpha (), i.e., the test
statistic is in the region of non-significance:
 the null hypothesis is not rejected (but not “accepted” either)
 the research hypothesis is not supported by the results.

22
Reporting guidelines
Wilkinson, L. (1999). Statistical Methods in Psychology Journals: Guidelines
and Explanations. The American Psychologist, 54(8), 594–604:
• It is hard to imagine a situation in which a dichotomous accept-reject
decision is better than reporting an actual p value or, better still, a
confidence interval
• Never use the unfortunate expression “accept the null hypothesis”
• Always provide some effect-size estimate when reporting a p value.

What we should not say about p values (Warner, 2020):


• If the computer output shows the p value of a statistic as .000, report it as
“p < .001”, not “p = .000”
• If the p value of the statistic is very small (e.g., p = 0.002), do not describe
the result as like “highly significant”
• If the p value is higher than the alpha level by a small amount, do not
describe the result as like “almost significant”, “close to significant”,
“marginally significant”, etc.
23

You might also like