Professional Documents
Culture Documents
Statistics
- Makes sense of variability
- Takes into account individual differences
- Not only about math, also about decision-making into making inferences on solving
certain problems
- Values the scientific method
Karl Pearson
- “Statistics is the grammar of science”
- Biographer of Francis Galton (the one who introduced correlation)
- Used the letter “r” (for regression) as a tribute to Galton
Francis Galton
- Eugenics
o Philosophy in which people are selectively mated together to produce talented,
“fit”, offspring
- Created his own intelligence tests
Population
- Entire group of individuals of interest in research
- N
Sample
- Selected to represent the population in a research study
- Results are generalized to population
- n
CAUTION!
Based on the APA Publication Manual (7th edition)
- use N to designate the number of members in the total sample
- use n to designate the number of members in a limited portion or subsample of the total
sample
Describing Data
1. Parameter
2. Statistic
Variables
- Characteristic or condition that can change or take on different values
- Research objective/question
o Relationship between variables for a specific group of individuals
Types of Variables
1. Discrete
- Separate, indivisible categories (class size, gender)
2. Continuous
- Infinitely divisible units (such as time)
o Real Limits
Boundaries set for each measurement category or interval for continuous
variables
Located exactly half-way (or half-unit) between adjacent categories
Upper real limit
Category + ½ unit
Lower real limit
Category – ½ unit
Person A (75.5) Person B (76.5)
Real Limit = 76
Measuring Variables
1. Nominal Scale
a. Unordered set of categories
b. Identified only by name
c. Determine similarity or differences (or sort) individuals
d. Categorical
e. No direction and magnitude
2. Ordinal Scale
a. Ordered set of categories (ranked)
b. Direction of difference between individuals (greater than or less than)
c. Only has direction
3. Interval Scale
a. Ordered series of equal-sized categories
b. Identify the direction and magnitude of a difference
c. Arbitrary location of the zero point
4. Ratio Scale
a. Is similar to an interval scale
b. Value of zero indicates the absence of the variable (absolute zero)
c. Identify the direction and magnitude of differences
d. Allow ratio comparisons of measurements
Research Methods
1. Experimental
2. Non-experimental
a. Qualitative research designs (ex. phenomenology, case studies)
b. Surveys and interviews (including FDGs)
c. Correlational designs
d. Quasi-experimental designs
Correlational Studies
- To determine presence of relationship between two variables
- To describe the relationship
- Simply observes the two variables as they exist naturally
- Cannot define causal relationship
Experiments
- Demonstrate a cause-and-effect relationship between two variables/allow to draw causal
inferences about behaviour
- Show that changing the value of one variable (independent variable) causes changes to
occur in an observed variable (dependent variable)
Four Basic Elements in an Experiment
Manipulation – researcher manipulates one variable (IV) by changing its value to create a
set of two or more treatment conditions (levels of IV)
Measurement – a second variable (DV) is measured to obtain a set of scores in each
treatment condition
Compare – scores in one treatment condition are compared with scores in another
treatment condition
Control – all other variables are controlled to ensure that they do not influence the two
variables being examined (EVs)
**be specific when describing the IVs and DVs (say which phobia and its intensity if applicable)
Hypothetical Constructs/Concepts
- Unseen processes postulate to explain behaviour
- Can’t be observed directly (as such the need to operationally define)
- Ex. stress, honesty, memory
Conceptual Definition
- “dictionary definition”
Operational Definitions
- Specifies the precise meaning of a variable within an experiment
- Defines a variable in terms of observable operation, procedures and measurements
- Describes operations involved in manipulating or measuring the variables in an
experiment (procedures/instructions on how to carry the experiment)
- Varies from one experiment to another
- Experimental OD (for an IV) or Measured OD (for a DV)
Extraneous Variable
- Any variable in a research study other than the specific variables being studied
- Factors that are not the focus of the experiment but can influence the findings
- Increases variability in the scores, making it more difficult to detect a significant
difference/treatment effect/interaction
- Ex. equipment failures (noises), inconsistent, instructions
Confounding Variables
- When an EV changes systematically across different treatment conditions of an
experiment
- When an EV in a way that is similar to the variable intentionally being studied
- Sabotages the experiment (not internally valid; doubts with interpretation)
The EV threatens the experiment’s internal validity as it changes systematically along with the
IV. The CV makes the experiment not internally valid.
Experiments
General variables to control
- Participant variable (ex. age, sex, IQ)
- Environmental variable (ex. noises, time of day, weather)
Control techniques
- Random assignments of participants
- Matching participants or environment through assignment
- Holding variables constant
Quasi-Experimental Designs
- Similar to experiments but lacks one or more of its essential elements
o No manipulation of variable to differentiate the groups
o No random assignment to treatment conditions
- Cannot demonstrate cause-and-effect relationships; simply demonstrate and describe
relationships, similar to correlational research
- Use quasi-independent variables (considered independent variables) to differentiate the
groups
o Pre-existing participant or environmental variables
o Time lapse (pre-post)
- Ex post facto studies and Non-equivalent groups
o Pre-existing participant or environmental variables differentiates the groups
o Cannot control assignment of participants to groups and cannot assure group
equivalence
- Pre-post Study
o Time passage used to differentiate groups
o Cannot control variables related to time
- Developmental research designs
o Examine changes in behaviour related to age
o cross-sectional studies and longitudinal studies
Research on cross-sectional vs. longitudinal studies
Sampling Error
- The discrepancy between a sample statistic and its population parameter
o Though samples are generally considered to be representatives of the entire
population, a sample is not expected to give a perfect accurately picture
Chapter 4: Variability
Variability
- A quantitative measure
- To describe the distribution
o How spread out or clustered the scores are in a distribution
o Distance of score or group of scores from each other
- Know how representative a score or group of scores is/are of the entire population
o Smaller distances (clustered) – the score is a better representation of the
population
o Bigger distance (spread out) – more difficult to fin score that is a good
representation of the population
o Variability seen as an error – sampling error
Chapter 5: z-scores
Z-score
- Value of the z-score tells the exact location of a score relative to all the other scores in the
distribution
- Standardizing the entire distribution allows us to compare scores even if they are from
different tests – two (or more) different distributions can be made
- Domain of all things is zero
- Standard deviation is 1
- If the original distribution is changed into a z-distribution, the shape won’t change
- Changing an x-value into a z-score involves creating a signed number that:
o Specifies the precise location of each x-value in a distribution
o The (+ or -) sign identifies the location, either above (+) or below (-) the mean
o The numerical value of the z-score corresponds to the number of standard
deviations between x and the mean
o (for population) z= X-mu/standard deviation x = mu +z(standard dev)
o (for sample) z= X-M/s
o As descriptive statistics, z-scores describe exactly where each individual is
located
o As inferential statistics, z-scores determine whether a specific sample is
representative of its population, or its extreme and unrepresentative
If z-score near 0 -> fairly typical/representative individual
If z-score at the extreme tails -> “noticeably different” from the others
Properties of z-score
1. The mean of a distribution of z-scores is always 0
2. The standard deviation of a distribution of z-scores is always 1
3. The sum of the squared z-scores is always N
4. Transforming the original distribution to a distribution of z-scores does not change the
shape of the original distribution and does not change the location of any individual score
relative to other s in a distribution
Other standardized distributions based on z-scores
- Although transforming x values into z-scores creates a standardized distribution, many
people find z-scores burdensome because they consist of many decimal values and
negative numbers.
- More convenient to standardize a distribution into numerical values that are simpler than
z-scores
What are the applications of z-scores?
- Helps identify the location of a score relative to other scores
Probability
- Method for measuring and quantifying the likelihood of obtaining a specific sample from
a specific population
- A fraction or a proportion
- Determined by a ratio comparing the frequency of occurrence for a specific outcome
relative to the total number of possible outcomes
- When a population of scores is represented by a frequency distribution, probabilities can
be defined by proportions of the distribution
- In graphs, probability can be defined as a proportion of area under the curve
- Whenever the scores in a population are variable, it is impossible to predict with perfect
accuracy exactly which score or scores will be obtained when you take a sample from the
population
Probability and Inferential Statistics
- If the sample has a high probability of being obtained from a specific population, then the
researcher can conclude that the sample is likely to have come from that population
- If the sample has a low probability of being obtained from a specific population, specific
population is probably not the source of the sample
- Those in the extreme tails of the distribution are probably not from the population
Random Sampling
- Ensures that all members of the population have an equal chance of being selected
- Process of sampling with replacement is utilized
- Why use a random sample?
o In order to apply the laws of probability to the sample
o Results in a sample that should be representative of the population
- Sampling with replacement
o Each member of the population selected for the sample is returned to the
population before the next member is selected
o Thus, the probability for one individual to be selected must stay constant from one
selection to the next (if more than one individual is selected)
- Sampling without replacement
o Members of the sample are not returned to the population before subsequent
members are selected
Probability and the Normal Distribution
- The unit normal able lists several different proportions corresponding to each z-score
location
Probability and the Binomial Distribution
- Binomial distributions
o Are formed by a series of observations (for example, 100 coin tosses) for which
there are exactly two possible outcomes
o The two outcomes are identified as A and B
o Probabilities of p(A)
- When pn and qn are both at least 10
o The binomial distribution is closely approximated by a normal distribution
With a mean of mu = pn
Standard deviation of sigma = sqrt(npq)
o A z-score can be computed for each value of X and the unit normal table can be
used to determine probabilities for specific outcomes
- Binomial distributions are actually discrete numbers – use appropriate upper and lower
limits in the computation of z-scores
- How to:
o Graph the distribution and identify area of interest
o Find out the limit equivalents of the area of interest
Ex. 15 or more -> use lower limit of 14.5
Ex. more than 15 -> use upper limit of 15.5
Ex. 15 or less -> use upper limit of 15,5
Ex. less than 15 -> use upper limit of 14.5
For First Long Exam (said to be the easiest)
- Bring own calculator, 2 pencils, eraser, black/blue ballpen, and correction fluid/tape
- No need to bring bluebook and unit normal table
- Chapters 1-6; SPSS basics
- Interpolation
- Central Tendency and Variability
- Formulas to memorize
o SS (definition and/or computational)
o Variance and SD (Population and Sample)
o Z-score (how to get z-score and value of X)
Population distribution
- A collection of all population scores
Sampling distribution
- A collection of statistics draw from all possible samples of a specific size from a
population
o An example of a sampling distribution is the distribution of sample means, which
is a collection of sample means of all possible random samples of a particular
sample size that can be obtained from the population
Sample distribution
- A collection of all sample scores
Hypothesis
- Thesis/main idea of an experiment/study consisting of a statement that predicts the
relationship between at least two variables
Hypothesis Testing
- The general goal is to rule out chance (sampling error)
- as a plausible explanation for the results from a research study
- Technique to help determine whether a specific treatment has a significant effect on the
individuals in a population
- “significant” – result is very unlikely to occur by chance alone
- Purpose is to decide between two possible scenarios
1. Sampling error
2. Too large to be explained by sampling error
Four Steps
1. State the hypothesis
a. Null hypothesis (Ho)
i. Always states that the treatment has no effect
ii. Population means before and after are the same
b. Alternative hypothesis (H1/Ha)
i. States that the treatment has an effect (there is a change, a difference or a
relationship for the general population)
ii. The population means before and after treatment are different, such that
their difference is too large to be explained by chance/sampling error
2. Set the criteria for a decision/locate the critical region
3. Collect data and compute sample statistic
4. Make a decision about the Ho and state the conclusion
Example 1
Step 1:
Ho: mu = 22.70
- Jokoy’s performance has no significant effect on the joviality of the audience.
Ha: mu /= 22.70
- Jokoy’s performance has a significant effect on the joviality of the audience
Step 2:
The distribution of sample means can be divided into two sections
Alpha Level / Level of Significance (α)
- To define the boundaries that separate the high-probability samples from the low-
probability samples
- Common alpha level values : α = 0.5, α = 0.01, α = 0.001
- If α = 0.05, then we separate the most unlikely 5% of the sample means (i.e., sample
means located in the extreme tails) from the most likely 95% of the sample means (i.e.,
sample means located in the center)
Critical Region
- Consists of outcomes that are very unlikely to occur if the null hypothesis is true (aka low
probability samples)
- Is defined by sample means that are almost impossible to obtain if the treatment has no
effect
o The phrase “almost impossible” means that these samples have a probability (that
is less than the alpha level
- Different from critical region boundaries
- Treatment has no effect if value of sample mean is located at the middle 95% of the
distribution (assuming α is 0.05)
- Treatment has an effect if value of sample mean is located at the critical region (i.e., less
than z= -1.96 and more than z= 1.96)
Step 3:
Test statistic (in this case, its z-score) forms a ratio comparing
Step 4: Make a decision
A large value for the test statistic shows that
- The obtained mean difference > expected if there is no treatment effect
- Is it large enough to be in the critical region?
o Conclude that the difference is significant or that the treatment has no significant
effect
o Reject the null hypothesis
Directional Tests
- When a research study predicts a specific direction for the treatment effect
- Possible to incorporate the directional prediction into the hypothesis test
Factors Affecting a Hypothesis Test
- Difference between sample mean and hypothesized population mean
- Standard error
o Variability of scores (high variability decreases the chance of getting a treatment
effect)
o Sample size (larger sample size increases the chance of getting a treatment effect)
Independent-Measures Designs
- Allows researchers to evaluate the mean difference between two populations using the
data from two separate samples
- The identifying characteristics of the design is the existence of two separate or
independent samples
- Can be used to test for mean differences between
o Two distinct populations (such as men versus women)
o Two different treatment conditions (such as drug versus no drug)
- Used where a researcher has no prior knowledge about either of the two populations (or
treatments) being compared
o The population means and variances/SDs are all not known and values must be
- General purpose of the independent-measures t test is
o Determine if the sample mean difference obtain in a research study is
Real mean difference between the two populations
Chance/sampling error
o Determine if the mean difference obtained in a research study is
Steps in Hypothesis Testing with the Independent-Measures t statistic
1. State the hypothesis
a. For the independent-measures t test, Ho states that there is no difference between
the two populations means
2. Locate the critical region
The Homogeneity of Variance Assumption
- If the assumption is violated, then the t statistic contains two questionable values:
1. The value for the population mean difference which comes from the null hypothesis,
and
2. The value for the pooled variance
- Cannot determine which if these two values is responsible for a t statistic that falls in the
critical region