You are on page 1of 84

Practical Statistical Training

Using SPSS Software

Trainer – Hailegebriel Yirdaw


(PhD Candidate at AAU and University of Gothenburg)
E-mail: hailaenani@gmail.com

1
Outline of the Training
Basic Concepts in Statistics
Introduction to the Windows of SPSS
Data Entry, Loading and Saving
Data Management
Descriptive Data Analysis
Correlation Analysis
Regression Analysis
◦ Linear regression
◦ Binary logistic regression
◦ Multinomial logistic regression
◦ Ordered logistic regression
Hypothesis Testing
Factor Analysis
Trainer: Hailegebriel Yirdaw 2
Trainer: Hailegebriel Yirdaw 3
Definition and Concept of Statistics
Definition
Statistics is the science of data analysis.
It deals with scientific methods of
◦ collecting,
◦ organizing,
◦ Presenting,
◦ analyzing data
so as to make valid conclusions and
reasonable decisions

Trainer: Hailegebriel Yirdaw 4


Major steps in a statistical investigation
Define the objectives and scope of the survey
Define the population and sampling units
Identify the proper sampling technique and
collect data
Organize the data to have a good overall
picture of the data
Analyze the data (calculate various statistics
of interest)
Make conclusions / predictions based on the
statistics computed from the sample by
Applying mathematical statistics and
probability theory
Trainer: Hailegebriel Yirdaw 5
The raw materials for any statistical
analysis are the data.
The data collected from the selected
sample can then be subjected to
statistical analysis, such as:
◦ Descriptive statistics
◦ Inferential statistics

Trainer: Hailegebriel Yirdaw 6


Basic Terminologies in Statistics
Population is the set of all elements that
belong to a certain defined group.
◦ Numerical characteristic of a population is
called a parameter.
Sample is a part (or a subset) of the
population.
◦ Numerical characteristic of a sample is called
a statistic.
Variable is the characteristic of the
individual to be measured or observed.
Trainer: Hailegebriel
7 Yirdaw 7
Variables whose values are determined by
chance are called random variables.
Value refers to either a subject’s relative
standing on a quantitative variable, or a
subject’s classification within a qualitative
variable.
Observation is the individual subjects
(or other objects) that serve as the
source of the data.

Trainer: Hailegebriel
8 Yirdaw 8
Descriptive and Inferential Statistics
Descriptive statistics -statistics used to
summarize or describe a set of observations.
Inferential statistics consists of generalizing
from samples to populations, performing
hypothesis testing, determining
relationships among variables, and making
predictions.

Trainer: Hailegebriel Yirdaw 9


Variables and Types of Data
Qualitative variables are variables that can be
placed into distinct categories, according to
some characteristic or attribute.
For example, gender (male or female).
Quantitative variables are numerical in nature
and can be ordered or ranked. Example: age
is numerical and the values can be ranked.

Trainer: Hailegebriel Yirdaw 10


Variables and Types of Data
Discrete variables assume values
that can be counted.
Continuous variables can assume all
values between any two specific
values. They are obtained by
measuring.

Trainer: Hailegebriel Yirdaw 11


Scale of Measurement of Data
a) Nominal Scale - this assigns numbers as a way to
label or identify characteristics.
 The numbers assigned have no quantitative meaning.
b) Ordinal Scale – it ensures that the possible categories
can be placed in a specific order (rank)
c) Interval Scale – the numbers in an interval scale are
obtained as a result of a measurement
process.
◦ Have some units of measurement.
◦ Ratios have no meaningful interpretation
d) Ratio Scale - highest form of measurement precision.
It possesses the additional feature that ratios have
meaningful interpretation.
Trainer: Hailegebriel Yirdaw 12
Activity 1. (5Minuts)
Give example of kind of data that you are
planning to collect in your project/thesis,
for each data describe the scale of
measurement scales .

Trainer: Hailegebriel Yirdaw 13


Data types
Variables

Scale Categorical

Continuous
Discrete: Ordinal: Nominal:
Measurements
Counts/ integers obvious order no meaningful order
takes any value

www.statstutor.ac.uk Trainer: Hailegebriel Yirdaw 14


Activity 2. (5Minuts)
What data types relate to following questions?

 Q1: What is your favourite subject?

Maths English Science Art French

Q2: Gender:
Male Female

Q3: I consider myself to be good at mathematics:

Strongly Disagree Neutral Agree Strongly


Disagree Agree

 Q4: Score in a recent mock GCSE maths exam:


Score between 0% and 100%
Trainer: Hailegebriel Yirdaw 15
What data types relate to following questions?

 Q1: What is your favourite subject? Nominal

Maths English Science Art French

Q2: Gender: Male Female Binary/ Nominal

Q3: I consider myself to be good at mathematics:


Strongly Disagree neutral Agree Strongly
Ordinal
Disagree Agree

 Q4: Score in a recent mock GCSE maths exam:


Scale
Score between 0% and 100%
Trainer: Hailegebriel Yirdaw 16
Activity 3. (5Minuts)

Identify each of the following as examples of (1) nominal,


(2) ordinal, (3) scale (4) discrete, or (5) continuous
variables:
1. The length of time until a pain reliever begins
to work.
2. The number of chocolate chips in a cookie.
3. The number of colors used in a statistics
textbook.
4. The brand of refrigerator in a home.
5. The overall satisfaction rating of a new car.
6. The number of files on a computer’s hard disk.
7. The pH level of the water in a swimming pool.
8. The number of staples in a stapler.

Trainer: Hailegebriel Yirdaw 17


Why is scale of measurement important?

Helps to present your data


Helps to decide which statistical analysis to
use. Parametric Vs non parametric.
If the scale are nominal and/or ordinal use non
Parametric only .
But interval and ratio scales can use both
parametric and non parametric.
The type of graph also depends on the
measurement scale: eg. Bar chart, pie chart for
ordinal/nominal but histogram for
ratio/interval scales.

Trainer: Hailegebriel Yirdaw 18


Source of Data and Method of
Data Collection
Source of Data
There are two main sources of data
Primary source: Sources that can supply
first hand information for immediate use.
Primary data: data originally collected for the
purpose at hand and
An individual, agency or organization controls
the design and data collection processes

Trainer: Hailegebriel Yirdaw 19


Secondary source: the sources in which data are
obtained from records of individuals that have been
collected by persons other than the investigator for
other purposes.
Example. Hospital records, vital statistical registers ….
◦ Secondary data: are data obtained from secondary
source or
◦ When you use data previously collected by others for
their own purposes

Trainer: Hailegebriel Yirdaw 20


Who in society wants or needs information?

Governments: Federal, provincial and


local governments need information on the
population
This information is used to develop, implement
and monitor social and economic programs.
Businesses: Most businesses require
information. This information may be about
the economy of a local population or
various social trends.

Trainer: Hailegebriel Yirdaw 21


Who in society wants or needs information?

Community groups:These
organizations need information about a
wide variety of subjects
Individuals: Everyone, from students to
pensioners, needs some form of information
at some time during their lives. The
information may be used to complete an
essay, a major project or simply to satisfy
one's curiosity.

Trainer: Hailegebriel Yirdaw 22


Populations and samples
Taking a sample from a population

Sample data ‘represents’ the whole population

Trainer: Hailegebriel Yirdaw 23


Sampling Process

Trainer: Hailegebriel Yirdaw 24


Defining the Target Population

It is critical to the success of the research


project to clearly define the target
population.
Rely on logic and judgment.
The population should be defined in
connection with the objectives of the
study.

Trainer: Hailegebriel Yirdaw 25


Census Sample
A census study occurs if the entire
population is very small or it is reasonable
to include the entire population (for
other reasons).

It is called a census sample because data


is gathered on every member of the
population.

Trainer: Hailegebriel Yirdaw 26


Why sample?

The population of interest is usually too


large to attempt to survey all of its
members.
A census may be very expensive.
A census may require too much time.
A carefully chosen sample can be used to
represent the population.
The sample reflects the characteristics of
the population from which it is drawn.

Trainer: Hailegebriel Yirdaw 27


Sampling Frame and Sampling Techniques
Sampling frame: The sampling frame is the actual list
of individuals that the sample will be drawn from.
Subjects are selected from the sampling frame
Example-You are doing research on working
conditions at Company X. Your population is all 1000
employees of the company. Your sampling frame is the
company’s HR database which lists the names and
contact details of every employee.
Sampling techniques: Outlines strategies used to
obtain a sample for a study
◦ Probability sampling :each member of the population has a
known non-zero probability of being selected
◦ Nonprobability sampling : members are selected from the
population in some nonrandom manner

Trainer: Hailegebriel Yirdaw 28


Sampling Techniques
Sampling techniques: Outlines strategies
used to obtain a sample for a study.
Types of sampling techniques:
◦ Probability sampling :each member of the
population has a known non-zero probability of
being selected
◦ Nonprobability sampling : members are
selected from the population in some
nonrandom manner

Trainer: Hailegebriel Yirdaw 29


Types of Probability Sampling

Simple random sampling

Systematic sampling

Stratified random sampling

Cluster sampling

Trainer: Hailegebriel Yirdaw 30


Simple Random Sampling
◦ Selected by using chance or
random numbers
◦ Each individual subject (human or
otherwise) has an equal chance of
being selected
◦ Examples:
Drawing names from a hat
Random Numbers generator

Trainer: Hailegebriel Yirdaw 31


Systematic Random Sampling
◦ Select a random starting point and then select every kth subject in the
population
◦ Simple to use so it is used often
◦ The sampling fraction is: 100/1200= sample size/study population =
1/12
◦ The sampling interval is therefore 12.
◦ The number of the first student to be included in the sample is chosen
randomly, for example by blindly picking one out of twelve pieces of
paper, numbered 1 to 12.
• If number 6 is picked as a starting number, then every twelfth student
will be included in the sample until 100 students are selected: the
numbers selected would be 6, 18, 30, 42, etc

Trainer: Hailegebriel Yirdaw 32


Stratified Sampling
Divide the population into at least two different groups
with common characteristic(s), then draw SOME subjects
from each group (group is called strata or stratum)
Results in a more representative sample

Trainer: Hailegebriel Yirdaw 33


Cluster Sampling
is an example of 'two-
stage sampling' .
First stage a sample of
areas is chosen;
Second stage a sample of
respondents within those
areas is selected.
Divide the population into groups
(called clusters), randomly select
some of the groups, and then collect
data from ALL members of the
selected groups
Used extensively by government and
private research organizations
Examples:
Exit Polls
Trainer: Hailegebriel Yirdaw 34
Non-Probability sampling
- Samples are selected deliberately by the
researcher. Each item in the population has
no equal chance.
Generally three conditions need to be met in
order to use non-probability sampling.
First, if there is no desire to generalize to a
population parameter, then there is much less
concern whether or not the sample fully reflects
the population - when precise representation is not
necessary.

Trainer: Hailegebriel Yirdaw 35


Non-Probability sampling
Secondly, it is used because of cost and time
requirements.
◦ probability sampling could be prohibitively
expensive since it calls for more planning and
repeated callbacks to assure that each selected
sample unit is contacted.
Thirdly, probability sampling may breakdown in its
applications.
◦ The total population may not be available for
the study in certain cases.

Trainer: Hailegebriel Yirdaw 36


Non-Probability sampling

Non-probability sampling methods:


Convenience sampling
Judgmental sampling or Purposive sampling
Case study
Ad hoc quotas
Snowball sampling

Trainer: Hailegebriel Yirdaw 37


Convenience Sampling

Use subjects that are easily accessible


Examples:
Using family members or students in a classroom
Mall shoppers

Trainer: Hailegebriel Yirdaw 38


Convenience sampling
• The method selects anyone who is convenient.
• It can produce ineffective, highly un-representative
samples and is not recommended.

• Such samples are cheap, however, biased and full of


systematic errors.

• Example: the person on the street interview


conducted by television programs is an example of a
convenient sample.

Trainer: Hailegebriel Yirdaw 39


Purposive Sampling
Also called judgmental or selective sampling
Efforts are made to include typical or atypical
subjects.
Sampling is based on the researcher’s judgment.
Using personal judgment to select sample
that should be representative OR selecting
those who are known to have needed
information.

Trainer: Hailegebriel Yirdaw 40


Quota Sampling
Quota sampling is the nonprobability equivalent of
stratified sampling.

◦ First identify the stratums and their proportions as


they are represented in the population

◦ Then convenience or judgment sampling is used to


select the required number of subjects from each
stratum.

Trainer: Hailegebriel Yirdaw 41


Network or Snowball Sampling
Also called snowball sampling
Takes advantage of social networks to get
the sample
One person in the sample asks another
to join the sample, and so on.

Trainer: Hailegebriel Yirdaw 42


Method of Data Collection
There are several methods of collecting
primary data. The important one includes
1) Direct observation: collecting data by
simple observation. It usually involves
counting the data of interest in person.
2) Personal interview: involves presentation
of oral-verbal stimuli and reply in terms of
oral-verbal responses.
a) Face-to-face
b) Telephone

Trainer: Hailegebriel Yirdaw 43


3) Self-completed (written questionnaire):
written questions are mailed or hand-
delivered to respondents.
a) Mail survey
b) Hand-delivered questionnaire

Activity 4. (5 minutes)
Which method(s) of data collection is/are
appropriate to use in your project work? Why?

Trainer: Hailegebriel Yirdaw 44


Tips on Questionnaire designing
The introduction should be informative
and stimulate respondents’ interest:
◦ interviewers give the respondent their name
and provide identification;
◦ explain that a survey is being conducted;
◦ describe the survey's purpose;
◦ give the respondent time to read or be
informed about confidentiality issues; etc

Trainer: Hailegebriel Yirdaw 45


Tips on Questionnaire designing
The questions should read well and have a good
flow.
– The words should be simple, direct and familiar
to all respondents.
– The questions should be clear and as specific as
possible.
– Questions should not be double-barreled
• Example: Does your company provide training for
new employees and re-training for existing staff?
This example is double-barreled as it asks two
questions rather than one

Trainer: Hailegebriel Yirdaw 46


Tips on Questionnaire designing
Questions should not be leading
If the questions are close-ended, the
response categories should be mutually
exclusive and exhaustive?
Open-ended questions give respondents an
opportunity to answer the question in their
own words.
Close-ended questions give respondents
a choice of answers and the respondent is
supposed to select one.
Trainer: Hailegebriel Yirdaw 47
Sample size determination

Statistical Estimation
Point estimate -- the single value of a statistic
calculated from a sample
Interval Estimate -- a range of values calculated
from a sample statistic(s) and standardized statistics,
such as the Z.
◦ Selection of the standardized statistic is determined
by the sampling distribution.
◦ Selection of critical values of the standardized
statistic is determined by the desired level of
confidence.

Trainer: Hailegebriel Yirdaw 48


Confidence Interval to Estimate µ
when n is Large

Point estimate X=
 X
n
Interval σ
X ±Z
Estimate n
or
σ σ
X −Z ≤µ ≤ X +Z
n n

Trainer: Hailegebriel Yirdaw 49


Distribution of Sample Means
for (1-α)% Confidence

α α
2 2
1−α

µ X

Z
− Zα 0 Zα
2 2

Trainer: Hailegebriel Yirdaw 50


Distribution of Sample Means
for (1-α)% Confidence

α α
2 2
1−α 1−α
2 2
µ X

Z
− Zα 0 Zα
2 2

Trainer: Hailegebriel Yirdaw 51


Probability Interpretation of the Level
of Confidence

σ σ
Pr ob[ X − Z α ≤µ≤ X +Zα ] = 1−α
2 n 2 n

Trainer: Hailegebriel Yirdaw 52


Distribution of Sample Means
for 95% Confidence

0.025 0.025
95%
0.4750 0.4750

µ X

Z
-1.96 0 1.96

Trainer: Hailegebriel Yirdaw 53


Example: 95% Confidence
Interval for µ
X = 4.26, σ = 11
. , and n = 60.

σ σ
X −Z ≤µ≤ X +Z
n n
11. 11
.
4.26 − 196
. ≤ µ ≤ 4.26 + 196
.
60 60
4.26 − 0.28 ≤ µ ≤ 4.26 + 0.28
3.98 ≤ µ ≤ 4.54
Trainer: Hailegebriel Yirdaw 54
Confidence Interval to Estimate µ
when n is Large and σ is Unknown

S
X ± Zα2
n
or
S S
X − Zα ≤ µ ≤ X + Zα
2
n n 2

Trainer: Hailegebriel Yirdaw 55


Z Values for Some of the More
Common Levels of Confidence

Confidence Level Z Value

90% 1.645

95% 1.96

98% 2.33

99% 2.575

Trainer: Hailegebriel Yirdaw 56


Confidence Interval to Estimate
the Population Proportion
$$
pq $$
pq
p$ − Zα ≤ P ≤ p$ + Zα
2 n 2 n
w here:
p$ = sam ple proportion
q$ = 1 - p$
P = population proportion
n = sam ple size

Trainer: Hailegebriel Yirdaw 57


What optimal size sample need?
The answer to this question is influenced by
a number of factors, including:
the purpose of the study, population size, the
risk of selecting a “bad” sample and the
allowable sampling error.
Data analysis plan e.g number of cells one
will have in cross tabulation
Most of all whether undertaking a
qualitative or quantitative study

Trainer: Hailegebriel Yirdaw 58


Sample size determination in quantitative study

To determine the appropriate sample size, the


basic factors to be considered are:
◦ Level of precision,
◦ Level of confidence or risk,
◦ Degree of variability in the attributes being
measured ( prevalence)

Trainer: Hailegebriel Yirdaw 59


Sample size determination…
The Level of Precision-sometimes called
sampling error
◦ range in which the true value of the
population is estimated to be.
◦ This range is often expressed in
percentage points (e.g., ±5 percent).
The Confidence Level
◦ based on ideas encompassed under the
Central Limit Theorem.
◦ E.g a 95% confidence level is selected, 95
out of 100 samples will have the true
population value within the range of
precision Trainer: Hailegebriel Yirdaw 60
Sample size determination…
Degree of Variability
◦ refers to the distribution of attributes in
the population.
◦ The more heterogeneous a population, the
larger the sample size required to obtain a
given level of precision.
◦ The less variable (more homogeneous) a
population, the smaller the sample size.

Trainer: Hailegebriel Yirdaw 61


Sample size determination…
• A proportion of 50 % indicates a greater level
of variability than either 20% or 80%. This is
because 20% and 80% indicate that a large
majority do not or do, respectively, have the
attribute of interest.
• Because a proportion of 0.5 indicates the
maximum variability in a population, it is often
used in determining a more conservative
sample size, that is, the sample size may be
larger than if the true variability of the
population attribute were used.
Trainer: Hailegebriel Yirdaw 62
Sample size determination…
Sample size affects accuracy of representation;
Larger sample means less chance of error
Minimum suggested sample is 30 and upper limit
is 1,000

Trainer: Hailegebriel Yirdaw 63


Strategies for Determining Sample Size

There are several approaches to determining


the sample size.
Using a census for small populations
Imitating a sample size of similar studies
Using published tables
Applying formulas to calculate a sample size
Use computer soft ware e.g EPI-info series

Trainer: Hailegebriel Yirdaw 64


Using a Census for Small Populations
• One approach is to use the entire population as the
sample.
• Although cost considerations make this impossible
for large populations.
• Attractive for small populations (e.g., 200 or less).
• Eliminates sampling error and provides data on all
the individuals in the population.
• Some costs such as questionnaire design and
developing the sampling frame are “fixed,” that is,
they will be the same for samples of 50 or 200.
• Finally, virtually the entire population would have to
be sampled in small populations to achieve a
desirable level of precision

Trainer: Hailegebriel Yirdaw 65


Using a Sample Size of a Similar Study

Use the same sample size as those of studies


similar to the one you plan( Cite reference).
Without reviewing the procedures employed in
these studies you may run the risk of repeating
errors that were made in determining the sample
size for another study.
However, a review of the literature in your
discipline can provide guidance about “typical”
sample sizes that are used.

Trainer: Hailegebriel Yirdaw 66


Using Published Tables

• Published tables provide the sample size for


a given set of criteria.
• Necessary for given combinations of
precision, confidence levels and variability.
• The sample sizes presume that the
attributes being measured are distributed
normally or nearly so.
• Although tables can provide a useful guide
for determining the sample size, you may
need to calculate the necessary sample size
for a different combination of levels of
precision, confidence, and variability.
Trainer: Hailegebriel Yirdaw 67
Sample Size for ±5%, ±7% and ±10% Precision Levels
where Confidence Level Is 95% and P=0.5.
Size of Population Sample Size (n) for Precision (e) of:

±5% ±7% ±10%


100 81 67 51
125 96 78 56
150 110 86 61
175 122 94 64
200 134 101 67
225 144 107 70
250 154 112 72
275 163 117 74
300 172 121 76
325 180 125 77
350 187 129 78
375 194 132 80
400 201 135 81
425 207 138 82
450 212 140 82

Trainer: Hailegebriel Yirdaw 68


Using Formulas to Calculate a Sample Size

Sample size can be determined by the


application of one of several mathematical
formulae.
Formula mostly used for calculating
a sample for proportions.
For example:
For populations that are large(infinite), the
Cochran (1963) equation yields
a representative sample for proportions.

Trainer: Hailegebriel Yirdaw 69


Cochran equation
Where n0 is the sample size,

Z is the selected critical value of desired


confidence level,;
(1 – α) equals the desired confidence level, e.g.,
95%);
The value for Z is found in statistical tables which
contain the area under the normal curve.
e.g Z = 1.96 for 95 % level of confidence

e is the desired level of precision,

p is the estimated proportion of an attribute that


is present in the population, and q is 1-p.

Trainer: Hailegebriel Yirdaw 70


Cochran’s formula for calculating sample
size when population size is finite:

With finite populations, correction for


proportions is necessary
If the population is small then the sample size can
be reduced slightly.
This is because a given sample size provides
proportionately more information for a small
population than for a large population.
The sample size (n0) can thus be adjusted using
the corrected formulae

Trainer: Hailegebriel Yirdaw 71


Cochran’s formula for calculating sample
size when population size is finite:

Where n is the sample size


N is the population size.
no is calculated sample size for
infinite population

Trainer: Hailegebriel Yirdaw 72


Yamane’s formula for calculating
sample size

It is a Simplified Formula For Proportions


Yamane (1967) provides a simplified
formula to calculate sample sizes.

Where n is the sample size,


N is the population size,
e is the level of precision.

Trainer: Hailegebriel Yirdaw 73


Note
The sample size formulas provide the number
of responses that need to be obtained.
Many researchers commonly add 10 % to the
sample size to compensate for persons that the
researcher is unable to contact.
The sample size also is often increased by 30 %
to compensate for non-response
( e.g self administered questionnaires).

Trainer: Hailegebriel Yirdaw 74


Formula For Sample Size For The Mean

The use of tables and formulas to determine


sample size in the above discussion employed
There are two methods to determine sample
size for variables that are polytomous or
continuous.
One method is to combine responses into two
categories and then use
The second method is to use the formula for
the sample size for the mean.

Trainer: Hailegebriel Yirdaw 75


Determining Sample Size when Estimating µ

Z formula X −µ
Z=
σ
n

E = X −µ
Error of Estimation
(tolerable error)
ασ 
2
ασ 
2 2
Z Z
Estimated Sample Size n = 2 2 =  2 
E  E 
 
σ
2
is the variance of an attribute in the population.
E is the desired level of precision
Trainer: Hailegebriel Yirdaw 76
Use of software in sample size determination
Depending on type of study and specific software
Some information will be required:
Population sample size, population standard
deviation, population sampling error, confidence
level, z –value, power of study etc …
80% power in a clinical trial means that the
study has a 80% chance of ending up with a p
value of less than 5% in a statistical test (i.e. a
statistically significant treatment effect) if there
really was an important difference (e.g. 10% versus
5% mortality) between treatments.

Trainer: Hailegebriel Yirdaw 77


Further considerations
The above approaches to determining sample
size have assumed that a simple random sample
is the sampling design.
More complex designs, e.g. case control studies
etc , one must take into account the variances of
sub-populations, strata, or clusters before an
estimate of the variability in the population as a
whole can be made.

Trainer: Hailegebriel Yirdaw 78


Sample size through proportional
allocation method
The proportional allocation method was originally
proposed by Bowley (1926).
In this method, the sampling fraction n/N is same in all
strata.
The allocation of a given sample of size n to different
stratum was done in proportion to their sizes

Ni
ni = n i =1, 2, 3.
N

Where n represents sample size,


Ni represents population size of the
i th strata and
N represents the population size
Trainer: Hailegebriel Yirdaw 79
Sample Size Adequacy in Factor Analysis
It depends on the number of variables number of
factors and the loadings
KMO-Measure of sampling adequacy (MSA). This
index ranges from 0 to 1.
KMO- 0.5-0.7-mediocre; 0.7-0.8 good;0.8-0.9
great; 0.9 and above Marvelous

Trainer: Hailegebriel Yirdaw 80


Sample Size Adequacy in Factor Analysis
It depends-see literature regarding sample size in
factor analysis that have been made.
These are usually stated in terms of either the minimum
sample size (N) for a particular analysis or the minimum
ratio of N to the number of variables, p i.e. the number
of survey items being subjected to factor analysis
(MacCallum et al 1999).
Gorsuch (1983) recommended five subjects per item,
with a minimum of 100 subjects, regardless of the
number of items.
Guilford (1954) argued that N should be at least 200,
Cattell (1978) recommended three to six subjects per
item, with a minimum of 250.

Trainer: Hailegebriel Yirdaw 81


Sample Size Adequacy in Factor Analysis
More demanding recommendations for sample size
require a minimum of 10 subjects(observations) per
item(variable) (Everitt 1975)
a large sample, ideally several hundred (Cureton &
D’Agostino, 1983).
Comrey and Lee (1992) provided the following advise
regarding sample size: 50 cases is very poor, 100 is poor,
200 is fair, 300 is good, 500 is very good, and 1,000 or
more is excellent.

Trainer: Hailegebriel Yirdaw 82


Types of Classification of data

Broadly there are three types of data


1. Cross-sectional data - are data on one or more
variables collected at a single point in time.
2. Time series data – are data that have been
collected over a period of time on one or
more variables
3. Panel data - have the dimensions of both time
series and cross-sections
-Data collected for the same sample(individulal unit),
at repeated time points

Trainer: Hailegebriel Yirdaw 83


Activity 5. (4 minutes)

Give your own example to each types of


classification of data and variables from
your surroundings

The End

Trainer: Hailegebriel Yirdaw 84

You might also like