Professional Documents
Culture Documents
sample theory
Data are collected for many purposes. They may be required to provide a
description of a particular occurrence (e.g. the distribution of parking
durations); they may be collected to test a particular hypothesis (e.g. do the
speeds of vehicles on a road change after the introduction of a public transport
priority scheme?); or they may be collected to learn something that was
previously unknown (e.g. what are the factors influencing the speed of traffic
on a residential street?). This chapter addresses the question of setting up
experiments to investigate these issues. It discusses appropriate experimental
design before turning to the choice of sample design and the determination of
appropriate sample sizes.
The simplest set of experiments involves the comparison of one variable over
two situations. This can involve studies over time or space. A common
example is the before and after study. Experiments of this type can use
random data or data that are grouped in some manner.
Since the total population conceptually contains all values that can occur
from a given observation, any set of observations collected from the
population is considered a sample. An important statistical idea is that of
random data: in certain circumstances, a set of observations may be regarded
as randomly drawn from the total population, i.e. there is an equal chance of
drawing each element of the sample from the population. This basic
150 Understanding Traffic Systems
Random data are collected as described in the preceding section, but for
more than one alternative. Examples of situations appropriate to this category
might include studies with a number of before and after surveys, or
comparisons of travel times over a number of routes. The analysis of these
data is usually carried out using analysis of variance techniques.
The concept of pairing the observations, discussed in the preceding
section, is a special case of blocked designs. As stated earlier, a block is a
portion of the experimental material that is expected to be more
homogeneous than the aggregate. In a blocked design two kinds of effects
are contemplated: firstly those of the alternatives which are of major
concern to the experimenter; and secondly those of blocks whose
contribution we wish to reduce. A randomised block design has the
advantages of providing the opportunity to eliminate block to block
variations while providing a wide inductive space for experiments run with
uniform raw material - several alternatives can be run with several different
blockings.
Blocking may involve running experiments close in time or in space. For
example, in a study of the effectiveness of accident countermeasures, spatial
blocking could be used to remove seasonal effects. In the following example
the blocking design for a test of the robustness of different brands of tyre
makes use of the fact that a test car can be equipped with a different brand of
tyre on each wheel. Determination of the amount of wear on each of four
brands of tyre (A - D) could be investigated by carrying out a series of tests
with each brand of tyre fitted successively to each wheel of the car (see Table
7.1).
Analysis of randomised block designs can be carried out using analysis
of variance. In some situations it may not be possible to get the number of
tests equalling the number of alternatives. For instance, it may only be
possible to obtain three tests for each blocking when there are four
alternatives. Table 7.2 presents a possible blocking for this situation. Once
again the random design reduces potential bias due to interaction effects
due does not altogether eliminate it.
This interaction bias could be eliminated by including all permutations.
Using the example of Table 7.2 this would require 24 (= 4! = 4×3×2×1)
tests, but clearly this ‘full factorial’ design is an expensive option. An
alternative, cheaper, approach is based on the so-called latin square. In
mathematical terms a latin square is a data matrix containing n×n cells.
Each cell contains one of n specific numbers so that they appear once and
only once on each row and column.
152 Understanding Traffic Systems
1 2 3 4 1 2 3 4
Front left A B C D A B C D
Front right B C D A B D A C
Rear left C D A B C A D B
Rear right D A B C D C B A
Table 7.3 Latin squares design for car emissions using 4 drivers and
4 cars
Driver Car
1 2 3 4
I A B C D
II C D A B
III B C D A
IV D A B C
The sampling frame is the ‘register’ of the target population. It is some list
or definition which defines and contains the members (units) of the target
population. A complete definition of the sampling frame is required for
sampling to proceed, and for any inferences to be made about the target
population from the sample results. The actual choice of the sampling
frame depends on the needs and constraints of the particular study. For
instance, the sampling frame might consist of the list of vehicles registered
as being garaged in a certain area, or it could be all those vehicles using a
particular road or junction during a specified time period. The main
requirement is that the sampling frame can be quantifiably identified so that
the actual size of the target population may be determined.
Two main methods exist for selecting samples from a target population.
These are judgement sampling and random sampling. In random sampling,
all members of the target population have a chance of being selected in the
sample, whereas judgement sampling uses personal knowledge, expertise
and opinion to identify sample members. Judgement samples have a certain
convenience. They may have a particular role, by way of ‘case studies’ of
particular phenomena or behaviours. The difficulty is that judgement
samples have no statistical meaning; they cannot represent the target
population. Statistical techniques cannot be applied to them to produce
useful results, for they are almost certainly biased. There is a particular role
for judgement sampling in exploratory or pilot surveys where the intention
is to examine the possible extremes of outcomes with minimal resources.
However, in order to go beyond such an exploration, the investigator
cannot attempt to select ‘typical’ members or exclude ‘atypical’ members
of a population, or to seek sampling by convenience or desire (choosing
Experimental Design and Sample Theory 157
There is always a possibility that the sample does not adequately reflect the
nature of the parent population. Random fluctuations (‘errors’) which are
inherent in the sampling process are not serious because they can be
quantified and allowed for using statistical methods. However, if due to
poor experimental design or survey execution, there is a systematic pattern
to the errors, this will introduce bias into the data and, unless it can be
detected, this will distort the analysis. A principal objective of statistical
theory is to infer valid conclusions about a population from unbiased
sample data, bearing in mind the inherent variability introduced by
sampling. The following summary provides a guide to the design of
samples for traffic data collection.
A distribution of all the possible means of samples drawn from a target
population is known as a sampling distribution. It can be partially described
by its mean and standard deviation. The standard deviation of the sampling
distribution is known as the ‘standard error’. It takes account of the
anticipated amount of random variation inherent in the sampling process
and can therefore be use to determine the precision of a given estimate of a
population parameter from the sample.
s
sx = (7.1)
n
Np-n s
sx = (7.2)
Np -1 n
n = α s
2 2
( 2 + u2 )
(7.3)
2 d2
Table 7.6 Values of α2(2 + u2) for use in sample size determination,
when determining percentile estimates with level of
confidence α per cent
especially when testing statistical hypotheses? These issues are dealt with
at length in subsequent chapters.