You are on page 1of 72

Sampling Methods and Inferential Statistics

Suparat Walakanon D5220038

Presentation Topics
1. Sampling Methods Population Sample Sampling Methods 2. Inferential Statistics Parametric Tests Nonparametric Tests

What is a population?
A population is the complete collection of specific types of elements such as scores, people, and other shared variables to be studied.

A population must be clearly defined in terms of the following 3 aspects:


Content research subjects Extent geographical boundaries Time the time period under consideration
Frankfort-Nachmias and Nachmias (1996)

The first-year SUT undergraduate students enrolled in English I course in Trimester 1/2010.

What is sampling?
Sampling is the process of selecting a small number of elements from a larger target group of such elements so that the data gathered from the small group will allow judgments or claims to be made about the populations.

Sampling Frame
A sampling frame is an actual set of units from which a sample has been identified, and should cover all the sampling units in the population of interest.

Potential Problems of a Sampling Frame


1. Incomplete frames - missing names of late enrolled students 2. Clusters of elements - samples are located in clusters (separate groups) 3. Blank foreign elements - inclusion of non-members of the population in the sample frame

Sampling Methods

Probability sampling

Nonprobability sampling

Probability Sampling
A sampling in which members of the population have equal chance (probability) of being selected. Nonprobability Sampling A sampling in which the chances (probability) of selecting members from the population are not equal.

Probability Sampling
Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling

Nonprobability Sampling
Convenience sampling Judgment sampling Quota sampling

Simple Random Sampling (SRS)


the probability of being selected is equal for all members of the population
Blind Draw Method (e.g. names placed in a box and then drawn randomly) Random Numbers Method (all items in the sampling frame given numbers, numbers then drawn using table or computer program)

Advantages of SRS
Fair Unbiased

Disadvantages of SRS
over- or under-sampling no guarantee of getting good representatives

Systematic random sampling


A sample is obtained be selecting every K-th e.g. every 15th participant from a list containing the total population, after a random start.

Advantages of Systematic Random Sampling


Efficiency..do not need to designate (assign a number to) every population member, just those early on on the list (unless there is a very large sampling frame). Less expensive faster than SRS

Disadvantages of Systematic Random Sampling


- Small loss in sampling precision - Potential periodicity problems

Stratified Sampling
The population is separated into homogeneous groups/segments/strata and a sample is taken from each. The results are then combined to get the picture of the total population.

Advantages of Stratified Sampling


representativeness of the composition of the population is guaranteed.

Disadvantages of Stratified Sampling


more complex sampling plan requiring different sample sizes for each stratum

Cluster sampling
method by which the population is divided into groups (clusters), any of which can be considered a representative sample

Advantages of Cluster Sampling


Economic efficiency faster and less expensive than SRS Does not require a list of all members of the population.

Disadvantages of Cluster Sampling


- Cluster specification error the more homogeneous the cluster chosen, the more imprecise the sample results.

Convenience Sampling
A sample is obtained by selecting individual participants who are easy to approach.

Advantages of Convenience Sampling


convenient inexpensive

Disadvantages of Convenience Sampling


- biased

Purposive Sampling
This method starts with a purpose in the researcher s mind, and the sample is thus selected to include participants of interest and exclude those who do not suit the purpose.

Advantages of Purposive Sampling


serves the purpose of the research is convenient

Disadvantages of Convenience Sampling


- subjective - low generalizibility

Quota Sampling
A sample is obtained by identifying subgroups to be included, then establishing quotas for individuals to be selected through convenience for each subgroup.

Advantage of Quota Sampling


can ensure that convenience samples will have desired proportion of subgroups

Disadvantage of Quota Sampling


- biased

INFERENTIAL STATISTICS
Hypothesis and Hypothesis Testing Level of Significance Directional and Non-directional Hypothesis Testing Type I and Type II Error Parametric and Nonparametric Tests

Research Hypothesis
A hypothesis is an assumption about the population parameter.

A parameter is a characteristic of the population, like its mean or variance. The parameter must be identified before analysis.

Hypothesis Testing
Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test:
Null hypothesis (H0) Alternative hypothesis (HA) Test statistic Rejection region (the alpha level)

H 0 : Q1 ! Q 2

Null and Alternative Hypotheses


Null Hypothesis (H0)
- Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in the study.

H0: Q 1 ! Q Alternative Hypothesis (HA)

- Statement contradictory to the null hypothesis (will

always contain an inequality)

HA :

Q1 { Q 2

The Alpha Level ( )


a probability value that is used to define the very unlikely sample outcomes if the null hypothesis is true
=.05 =.01

the most unlikely 5% (or 1%) of the sample means (the extreme values) is separated from the most likely 95% (99%) of the sample means (the central values).

Critical Region

Critical Value
Value or values that separate the critical region (where we reject the null hypothesis) from the values of the test statistics that do not lead to a rejection of the null hypothesis

Critical Value
Value or values that separate the critical region (where we reject the null hypothesis) from the values of the test statistics that do not lead to a rejection of the null hypothesis

Critical Value ( z score )

Critical Value
Value or values that separate the critical region (where we reject the null hypothesis) from the values of the test statistics that do not lead to a rejection of the null hypothesis
Reject H0 Fail to reject H0

Critical Value ( z score )

Two-tailed,Right-tailed, Left-tailed Tests


The tails in a distribution are the extreme regions bounded by critical values.

Two-tailed Test

H0: = 100 H1: { 100

Two-tailed Test

H0: = 100 H1: { 100


E is divided equally between
the two tails of the critical region

Two-tailed Test
H0: = 100 H1: { 100
Means less than or greater than

E is divided equally between


the two tails of the critical region

Two-tailed Test
H0: = 100 H1: { 100
E is divided equally between
the two tails of the critical region

Means less than or greater than


Reject H0 Fail to reject H0 Reject H0

100

Values that differ significantly from 100

Right-tailed Test
H0: e 100 H1: > 100

Right-tailed Test

H0: e 100 H1: > 100


Points Right

Right-tailed Test
H0: e 100 H1: > 100
Points Right
Fail to reject H0 Reject H0

100

Values that differ significantly from 100

Left-tailed Test

H0: u 100 H1: < 100

Left-tailed Test
H0: u 100 H1: < 100
Points Left

Left-tailed Test
H0: u 100 H1: < 100
Points Left
Reject H0 Fail to reject H0

Values that differ significantly from 100

100

Conclusions in Hypothesis Testing


always test the null hypothesis
1. Reject the H0 2. Fail to reject the H0

need to formulate correct wording of final conclusion

Type I Error
 The mistake of rejecting the null hypothesis when it is true. 

(alpha) is used to represent the probability of a type I error

 Example: Rejecting a claim that the group mean score equals 96 when the mean really does equal 96

Type II Error
the mistake of failing to reject the null hypothesis when it is false.

 (beta) is used to represent the


probability of a type II error

Example: Failing to reject the claim that the group mean score is 96 when the mean is really different from 96

Inferential Statistics
Parametric Tests
normal distribution ratio or interval scale random sampling T-test ANOVA Pearson s Chi-square

Nonparametric Tests
do not require normality ordinal or nominal scale

t-tests
Compute two sets of mean values 1. one sample t-test 2. two independent samples t-test 3. two paired (dependent) samples ttest

One group t-test


to examine whether a sample mean value is different from a pre-set value
Example: Is the students TOEFL mean score higher or lower than 500?

One group t-test


Formulating a null and research hypothesis

H0: The students TOEFL mean score is about 500. HA: The students TOEFL mean score is different from 500.

Students Individual Scores


500 530 440 450 460 485 465 510 490 495 500 505 430 470 500 510 490 485 520 475 460 490 465 520

Output Data
Significant at p-value = .011, p < .05 Reject H0 The students TOEFL mean score is different from 500

Dependent-sample t-test
compares the means of individual participants in one group. pre-test posttest design
Example:

Is the students individual scores of the pre-test and posttest different?

Formulating a null and research hypothesis

H0: There is no difference between the mean scores of the pre-test and posttest. HA: The students mean scores in the posttest is higher than those in the pre-test

Data Output for dependent t-test

Significant at p = .025, p < .05


Reject H0, The students mean scores in the post- test is higher than those in the pre-test

Independent-sample t-test
examines whether the mean values of two independent groups are significantly different.

A researcher wants to know whether the students of his class perform better or worse than students in another class in an English final examination.

Research Hypothesis
H0 : There is no difference between the mean scores of the two classes. HA: The mean scores between two classes are different

Not significant Retain H0

One-Way ANOVA
The response variable is the variable you re comparing The factor variable is the categorical variable being used to define the groups We will assume k samples (groups) The one-way is because each value is classified in exactly one way Examples include comparisons by gender, race, political party, color, etc.

One-Way ANOVA
determines whether there is any significant difference of the mean values among sample groups
Why not repeated t-tests? 1. One-way ANOVA can handle the comparison for more than two groups in one time. 2. More tests done, higher risk of Type-I error.

Research Hypothesis

H0:

All the means are equal.

HA: At least two groups have different mean value.

ANOVA + Post Hoc tests


ANOVA only tells whether one pair of mean scores are different but it does not tell which pair is different. Post hoc tests e.g. Sheffe or Tukey s tests will do this job.

Non-parametric Test
Pearson s Chi-square
- Goodness-of-fit

test

- Test for Independence

Goodness-of-Fit Test
Compares observed frequencies within groups to their expected frequencies. HO = observed frequencies are not different from the expected frequencies. Research hypothesis: They are different.

Test of Independence
Review cross-tabulations (= contingency tables) Are the differences in responses of two groups statistically significantly different? One-way = observed vs expected Two-way = one set of observed frequencies vs another set.

Thank you very much

You might also like