Professional Documents
Culture Documents
Introductory notes in
transport planning and
travel demand modelling
2 DATA COLLECTION 22
2.1 Introduction to survey planning 22
2.2 Sampling methods 25
2.2.1 Sample design 25
2.2.2 Sampling methods 26
2.3 Errors in data collection and modelling 28
2.3.1 Types of error 30
2.3.2 Model complexity versus data accuracy 30
2.4 Data collection 32
2.4.1 Household-based survey 33
2.4.2 Non-household based survey 35
2.4.3 Data correction, expansion and validation 39
2.4.4 Stated preference surveys 40
2.4.5 Longitudinal surveys 41
1.1.1 Introduction
In a community, decisions made by decision-makers are mostly based on plans that provide information
like forecasts on future developments in a certain policy area. Transport is one of those policy areas. It is
of major importance to the community as the economic and social health of an area depends on the
performance of the transport system.
Urban transport planning started in the United States in the 1950s with the Detroit and Chicago
Transport Studies, and was used to inform decision-makers on the transport system. Urban transport
planning analyses the transport system, gives forecasts on future performance of the system and
suggests measures to improve this performance in order to meet the level desired.
In earlier times the studies were mainly concerned with the provision of capacity for the growing demand
of motorcar travel. Now, after nearly fifty years, the major concern is about the environmental effects,
and studies are focussing on how to restrain the growth of motorcar travel, with transport pricing as the
main objective.
The rise of urban transport planning included the start of the development of transport models, as they
are an essential component of urban transport planning. The techniques developed in the United States
were imported tot the United Kingdom in the 1960s, followed by important theoretical developments in
both the United States and Europe in the next twenty years.
Although the development of transport models has been evolutionary rather than revolutionary, two
important changes have taken place:
q A theoretical framework was developed, compatible with economic theory, providing a justification
and clarification of methods that were originally proposed on practical grounds.
q The major increase in computing power made it possible to analyse problems with a significant larger
scale and level of detail.
1
Based on M.D. Meyer & E.J. Miller, Urban transportation planning, 2001, p. 1-35, 256-261; J. de D. Ortúzar, &L.G.
Willumsen, Modelling transport, 1994, p. 1-32; R. Tolley & B.Turton, Transport systems, policy and planning, 1995, p.
197-210; E.A. Beimborn, A transportation modelling primer, 1995; M. Taylor, W. Young & P. Bonsall, Understanding
traffic systems, Data, analysis and presentation, 1996, p. 32-33
To understand a system it has to be analysed. The basic components of this analysis are:
q Definition: what problem is the plan intended to solve?
q Projection: how will the situation develop if the problem continues?
q Constraints: what are the limits of finance, time etc. within which planning must take place?
q Options: what are the alternatives and their pro’s and con’s?
q Formulation: what are the main alternative plans, i.e. packages of available options within the
prevailing constraints
q Testing: how would each of the alternative plans work out in practice?
q Evaluation: which plan gives greatest value (within the constraints) in terms of solving the problems
already defined?
The results and proposals of this analysis would be fed back into the political process for appraisal. Often
adaptations will have to be made to the plan, so the process can be seen as a learning one.
The basic stages of the process are the pre-analysis, technical analysis and post-analysis phase (figure
1.1).
After that data need to be collected on land use, transport and travel inventories, distribution of land
use, current travel patterns, preferred travel modes and socio-economic situation. More on data
collection in chapter 2.
The urban transport modelling system consists of a set of submodels which tackle the problem in four
stages:
1 Whether to make a trip (trip generation)
2 Where to go (trip distribution)
3 Which mode of transport to use (modal split)
4 Which route to use (traffic assignment)
As this lecture note is mainly on modeling transport, the following sections and chapters will concentrate
on this four-stage model.
A model can be defined as a (simplified) representation of a part of the real world – the system of
interest – which concentrates on certain elements considered important for its analysis from a particular
point of view. There are physical and abstract models. The former is used, for example, in architecture.
The latter includes the analytical models that are used in transport planning.
Models are necessary in transport planning because it is impossible to conduct experiments on existing
infrastructure, and, of course, on non-existing infrastructure and transport modes (e.g. roads, rail track,
a new mode of bus transport).
Analytical models attempt to replicate the system of interest and its behaviour by means of
mathematical equations. These equations are based on theoretical statements about the system and its
behaviour. The models are often complex and large amounts of data are required. The value of these
models is limited to a range of problems under specific conditions.
As we cannot predict the effects of future developments on transport, models can be used to do this. The
mathematical equations depend on a range of variables that might change in the future. A model is
designed with base-year data but if it is adequate, it can also be solved for future values of the variables,
which delivers the desired forecasts. However, one should always keep in mind that models are subject
to bias when used for forecasting.
1. Formulation of the problem. A problem can be defined as a mismatch between expectations and
perceived reality. The formal definition of a transport problem requires reference to objectives,
standards and constraints. The objectives are a definition of an ideal but achievable future state.
Standards are provided to compare whether minimum performance is being achieved at different
levels of interest (e.g. if all signalised junctions in a city operate at a 90% degree of saturation, this
can indicate a network overload). Constraints can be of many types: financial, temporal,
geographical, technical, or areas/buildings that should not be threatened by new proposals.
2. Collection of data about the present state of the system of interest in order to support the
development of the analytical model. Data collection and model development are closely interrelated
as the latter defines which types of data are needed.
3. Construction of an analytical model of the system of interest. In general one would select the
simplest modelling approach possible to make a choice between schemes on a sound basis. The
construction of an analytical model involves specifying it, estimating and calibrating its parameters
and validating its performance.
4. Generation of solutions for testing. This can be achieved in many ways: ranging from tapping the
experience and creativity of local transport planners and interested parties to construction of a large-
scale design model, perhaps using optimisation techniques.
5. In order to test the solutions or schemes proposed in the previous step it is necessary to forecast
the future values of the planning variables which are used as inputs to the model.
6. Testing the model and solution. The performance of the model is tested under different scenarios
to confirm its reasonableness.
8. Implementation of the solution and search for another problem to tackle; this requires recycling
through this framework starting again at point (1).
Formulation of the
problem
Data collection
Construct analytical
model and calibrate
Implement solutions
What are the differences between an airplane, an oil tanker, a car and a bicycle? Many indeed, but they
each share the common goal of fulfilling a derived transport demand, and they thus all fill the purpose of
supporting mobility. Transport is a service that must be utilized immediately and thus cannot be stored.
Mobility must occur over transport infrastructures, providing a transport supply. In several instances,
transport demand is answered in the simplest means possible, notably by walking. However, in some
cases elaborate and expensive infrastructures and modes are required to provide mobility, such as for
international air transport.
2
Based on: T. Schoenmaker, Samenhang in vervoer – en verkeerssystemen, 2002, Coutinho, Bussum; Rodriguez et
al., The geography of transport systems, 2006, Routlege.
There is a simple statistical way to measure transport supply and demand for passengers or freight. The
passenger-km is a common measure expressing the realized passenger transport demand as it compares
a transported quantity of passengers with a distance over which it gets carried. The ton-km is a common
measure expressing the realized freight transport demand. Although both the passenger-km and ton-km
are most commonly used to measure realized demand, the measure can equally apply for transport
supply.
For instance, the transport supply of a Boeing 747-400 flight between New York and London would be
426 passengers over 5,500 kilometres (with a transit time of about 5 hours). This implies a transport
supply of 2,343,000 passenger-kms. In reality, there could be a demand of 450 passengers for that
flight, or of 2,465,000 passenger-km, even if the actual capacity would be of only 426 passengers (if a
Boeing 747-400 is used). In this case the realized demand would be 426 passengers over 5,500
kilometres out of a potential demand of 450 passengers, implying a system where demand is at 105% of
capacity.
Transport demand is generated by the economy, which is composed of persons, institutions and
industries and which generates movements of people and freight. When these movements are expressed
in space they create a pattern, which reflects mobility and accessibility. The location of resources,
factories, distribution centers and markets is obviously related to freight movements.
2. Transport services, for enabling the movement of travellers and goods, using different transport
modes. These services can be public, semi-public or private (incl. walking). The space – time
distribution of transport services is strongly related to the distribution of activities in space – time.
For private means this follows directly from peoples choices to travel. For ‘public’ means the travel
market dominates whether services will be offered (supplied to the demand for travel).
3. Traffic services, for enabling the movement of transport modes through physical infrastructure and
management and operations of the infrastructure (incl. pricing policies).
The total traffic and transport system can be depicted as an interrelated system of layers (see Figure)
market types (the travel market between the demand for travel (travel patterns) and the supply of
transport services and the traffic markets between the demand of these services for infrastructure and
the supply of infrastructure) exist.
Distribution in space
Elements
and time Transport/traffic domain
Land-use/Transport Interaction
Activity-based analysis
Travel demand Travel patterns Travellers, freight Travel demand modelling
Travel and gender
Accessibility analysis
Impact analysis (macro)
travel market equilibrium …
Network and corridor design
Location allocation
Travel supply=
Transport services Modes of transport Intermodal planning
Traffic demand
(network, corridor, nodes)
BRT & NMT services planning
traffic market equilibrium Routing and logistics
…
Traffic control and
optimization
Traffic supply Traffic services Traffic infrastructure Infrastructure maintenance
BRT & NMT services control
Impact analysis (micro)
...
The theoretical background on transport systems can largely be derived from economic theory. There are
four aspects of economic theory that will be explained for transport problems. These are consumer travel
behaviour, demand, supply and equilibrium.
max(U ) = U ( X 1 ,..., X n )
Y = P1 X 1 + ... + Pn X n
Figure 1.5 presents the solution of this problem when two types of goods (X1 and X2) are considered. The
indifference curve u presents the combinations of X1 and X2 corresponding with a given utility level. The
income line y presents the possible combinations of X1 and X2 corresponding with a given income level.
The equilibrium is reached at point E, and represents the point at which the individual’s valuation of the
goods is the same as the market valuation.
In this basic premise, the assumption is made that utility is generated by the quantity of goods, while, in
most cases, it is generated by the attributes of goods. The demand for a good therefore depends on its
price, characteristics and the characteristics of the consumer.
In case of transport, the “good” being demanded is a certain transport service. The “price” consists of all
perceived costs of the traveller, not only the monetary costs of the trip but also the time spent travelling.
Long-run costs are rarely used in utility functions as they are unlikely to influence the decision of a
traveller. If the monetary value of time (or other factors) is known, time and price can be combined to
yield a generalised cost of travel. This is however, not necessary.
The utility of a trip, and therefore the demand for it, depends upon the characteristics of:
q The trip to be made
q The available modes
q The individuals making the trips
1.2.3 Demand
Demand for travel is actually a derived demand, as it is generated by the desire to join in activities, and
generally not by the desire just to travel. The transport system provides a physical connection between
activities.
Due to the derived nature of it, transport demand cannot be analysed without considering the socio-
economic activity system, as it is served by the transport system and generates travel demand. The
accessibility provided by the transport system can over longer periods influence where people live and
where economic activities occur. Therefore predicting land use patterns is necessary when travel demand
is forecasted over longer periods in time.
Travel can be characterised in terms of time, monetary cost, inconvenience, discomfort, and so on,
associated with the trip. These characteristics represent the disutility or ‘generalised cost of travel’, as
one would prefer to spend less time travelling, incur less expense, and be more comfortable. It is
reasonable to assume that a potential trip maker will choose the option with the maximum (personal)
utility out of the mobility options available for the specified trip.
1.2.4 Supply
The supply curve expresses the quantity of a given good that will be supplied or produced as a function
of the price of the good. This function will always be upward sloping (or at least non-decreasing)
indicating that greater quantities of the good will be produced only if the price of the good rises. This is
due to the fact that it leads to higher marginal operating cost. In the long run however, these costs can
be reduced, which leads to a supply curve that lies under the original one.
It is clear that the supply function, as is the case with the demand function, depends on more factors
than just the price of the good, including the prices of the input factors and the technology used to
produce the good.
In transport, one of the possible definitions of supply is system performance. This can be seen as the
whole of travel times, headways and capacities provided by the transport system given a certain capital
investment, operating strategy and demand level. This leads to an inverted supply function, in the way
that the price (e.g. travel times and costs) is now a function of the quantity of the good (i.e. what level
of demand can be accommodated, i.e. the flows).
1.2.5 Equilibrium
In figures 1.5 and 1.6 the demand and supply curves are drawn in one diagram. The point of intersection
between the curves is called the equilibrium point. At this point the quantity demanded is equal to the
quantity supplied.
If shifts in demand and supply curves do not occur, markets can be expected to move towards the
equilibrium point. This can be explained in the following way:
q If demand is higher than supply, the prices would rise due to “bidding up” of the customers. This
stimulates an increase in supply and a decrease in demand, and thus driving the market to the
equilibrium point.
q The other way around, if supply is higher than demand, the prices would fall, stimulating a decrease
in supply and an increase in demand.
It is assumed that there will also be equilibrium within a transport system, or at least it will arrive in such
a state after being left undisturbed for some time. There will of course be disequilibria due to, for
example, traffic accidents, but these will always be transient. It is, however, difficult to compare the
units of travel for demand and supply. As for demand, the units are counted in number of trips or
distances, while for supply the response of the system is related to volumes of traffic at different places
and times.
In figure 1.6, where the vertical axis denotes supply in terms of the ‘price’ of travel (travel time + travel
costs = generalized costs) offered and the horizontal axis denotes the demand in terms of the number of
trips made, i.e. the flow. It is shown that if a shift in e.g. the supply curve occurs (from supply 1 to
supply 2) a ‘new’ equilibrium point will be found following the same mechanism set out above. The
‘newly’ derived travel demand (on top of the already ‘revealed’ demand) is often called the ‘induced’
demand.
Section 1.3.3 provides an overview of types of models that can be used in transport studies at different
levels of detail.
It is often possible to derive the same functional form from different theoretical perspectives (e.g. the
functional form of the gravity model that will be discussed later can be derived from analogy with
physics, entropy maximisation and maximum utility formalisms). The model output, however, is
dependent on the theory adopted. Using a theoretical framework also extends the credibility of a model
being able to forecast future behaviour.
The deductive approach has been found more productive in pure sciences, and the inductive approach
has been preferred in the analytical social sciences. In both cases data play a central role. The
availability and nature of data in many cases restricts the choice of a model to a single option.
14 UNIVERSITY OF TWENTE, THE NETHERLANDS
INTRODUCTION TO TRANSPORT PLANNING AND MODELLING
An issue closely related to the question of data is the type of variables to be represented in the model.
Models predict a number of dependent (endogenous) variables given other independent (explanatory)
variables. Data is needed on each variable to test the model. One type of variable is the policy variable,
which is interesting because it is under control of the decision-maker, and can therefore be varied by the
analyst in order to evaluate different policies.
Model specification
The following themes can be recognised, concerning model specification:
q Model structure: can the system be modelled by a simple structure, which assumes, for example,
that all alternatives are independent. Or is it necessary to build a complex model to be able to
calculate probabilities of choice conditional on previous selections.
q Functional form: is it possible to use linear forms or does the problem require postulating more
complex non-linear functions.
q Variable specification: which variables are used and in which form should they enter the model
(e.g. if income is assumed to influence individual choice, should it enter the model as a variable, or
deflating a cost variable?).
The large majority of transport models have been built on cross-sectional data. This had led to the
tendency that validation of the model was interpreted in terms of the goodness-of –fit achieved between
observed behaviour and the base year predictions. Although this is a necessary condition for model
validation, it is not sufficient. Validation requires comparing the model predictions with information not
used during the model estimation process.
Starting with the modelling task, a modeller has to decide which variables are going to be predicted by
the model and which variables will be required as input to it. Some variables will never enter the model
because the modeller lacks control over them or because the theory behind the model ignores them. This
implies immediately a degree of error and uncertainty, which gets compounded with other errors
inherent to modelling, for example: sampling errors and errors due to the simplification of reality that is
unavoidable to make the model practical. See figure 1.5 for an overview on the modelling process.
It is interesting to mention that the twin concepts of model calibration and model estimation have taken
traditionally a different meaning in the transport field. Calibrating a model requires choosing its
parameters, assumed to have a non-null value, in order to optimise one or more goodness-of-fit
measures, which are a function of the observed data. This procedure has been associated with the
physicists and engineers responsible for the first generation of transport models who did not worry
unduly about the statistical properties of these indices, e.g. how large any calibration error could be.
Estimation involves finding the values of the parameters, which make the observed data more likely
under the model specification; in this case one or more parameters can be judged non-significant and left
out of the model. Estimation also considers the possibility of examining empirically certain specification
issues; for example, structural and/or functional form parameters may be estimated.
The main use of models in practice is for conditional forecasting (i.e. it produces estimates of dependent
variables given a set of independent variables). Typical forecasts are conditional in two ways:
q In relation to the values assigned to the policy variables, of which the impact is being tested with the
model
q In relation to the assumed values of other variables
A model is normally used to test a range of alternative plans for a range of possible future values of the
other variables. This means that the model has to be ‘run’ many times to generate all outcomes for the
ranges that are mentioned above. A lot of computing power is needed to guarantee a quick turn around
time, given that transport models involve complex equilibration processes and contain considerable
amounts of data.
Of central interest is the aggregation of exogenous data, that is, information about items other than the
travel behaviour (this is the endogenous or dependent variable, which the model attempts to replicate).
Exogenous data can be seen in many cases as input to the value of the independent variables.
In aggregate or first generation models (such as the trip distribution and modal split models that will be
discussed in chapters 5 and 6), the model at base aims at representing the behaviour of more than one
individual. These models were used up to the late 1970s. They became familiar, demanded relatively few
skills and have the property of offering a ‘recipe’ for the complete modelling process. First generation
models have on the other hand been criticised for their inflexibility, inaccuracy and cost.
Disaggregate or second generation models attempt to represent the behaviour of individuals (e.g.
discrete choice models). They became increasingly popular in the 1980s, and offer substantial
advantages over the traditional methods while remaining practical in many application studies. However,
they demand a higher level of statistical and econometric skills from the analyst than is the case with
aggregate models.
The difference between first and second generation model systems have often been overstated, as the
disaggregate models have been seen as ‘revolutionary’ while eventually it became clear that an
‘evolutionary’ view was more adequate. In many cases there is a complete equivalence between the
models. The difference lies in the treatment of the description of behaviour, particularly during model
development process. The disaggregate approach is superior in that case.
The issue is if one of both approaches is to be preferred, and in what circumstances. It has been
concluded that there is not a definitive approach appropriate to all situations, therefore the best
approach needs to be chosen for a certain situation.
q Microscopic simulation, of individual units in a traffic stream. For example, for the assessment of
individual vehicle or driver performance at an intersection or along a link.
q Macroscopic flow models, in which the flow units are assumed to behave in some collective
fashion.
q Simulation models of flows in intersection clusters, for the optimisation of network performance
(e.g. delays at traffic signals when the flows on each road section or link are fixed).
q Dense network models, which simulate flows in small-scale networks where the level of flow on
each link can vary in response to changes in the traffic control system and traffic congestion levels.
These models focus on short time periods (e.g. a peak hour).
q Strategic network models, which simulate or optimise network flows in the large-scale networks,
which represent a regional or metropolitan transport system. These models focus on long time
periods (e.g. 24 hour flows).
q Land use impact assessment models, that focus on the extent of changes to new land use
facilities (e.g. a retail centre) and use a rudimentary description of the transport system serving that
facility in predicting its impacts on the surrounding region.
q Sketch planning models, of land use-transport interactions.
The models presented in this lecture note can be used as the components of the above models 4 to 7.
The approach starts with considering a network and zoning system (see chapter 3) and the
collection of data (see chapter 2). These data are used to estimate a model of the total number of trips
generated by or attracted to each zone of the study area: the trip generation model (see chapter 4).
UNIVERSITY OF TWENTE, THE NETHERLANDS 17
The next step is to allocate these trips to particular destinations, so a trip matrix can be produced. This is
called trip distribution (see chapter 5). The following stage is usually the modelling of the choice of
mode, which is called modal split (see chapter 6). The last stage in the classic model requires the
assignment of the trips by each mode to their corresponding networks (see chapter 7).
The classic transport model is seen as concentrating on only a limited range of travellers’ responses.
Current thinking requires an analysis of a wider range of responses to transport problems and schemes.
For example, when a trip maker is faced with increased congestion, he can respond with a range of
simple changes to:
q The route followed to avoid congestion or take advantage of new links
q The mode used to get to the destination
q The time of departure to avoid the most congested part of the peak
q The destination of the trip to a less congested area
q The frequency of journeys by undertaking the trip at another day
Alternative methods
Some contemporary approaches attempt to treat simultaneously the choices of trip frequency,
destination and mode of travel, thus collapsing trip generation, distribution and modal split in one single
model. Other approaches emphasise the role of the household activities and the travel choices they
entail: the so-called activity-based models. These are more difficult to cast into the four-stage model,
and they are not yet in operational use. However these models provide an improved understanding of
travel behaviour and are therefore likely to enhance conventional modelling approaches in the future.
decision variables depending on the trip generation unit (e.g. household). However, this makes it difficult
to include attributes of the journey and modes in the mode. Therefore it could be better to perform trip
distribution and modal split simultaneously.
It should be noted that the classic model makes trip generation inelastic to the level of service provided
in the transport system. This is probably unrealistic, but only recently techniques have been developed
which can take systematic account of these effects.
Once the model has been calibrated and validated for the base year conditions it must be applied to one
or more planning horizons. Therefore different scenarios and plans should be developed that describe the
transport system and planning variables under alternative futures. After that, the model can be run again
(several times, depending on the number of alternatives) with this new input. A comparison can then be
made, most likely between costs and benefits, of different schemes under different scenarios, from which
the most attractive programme can be chosen. This depends on the conditions that it is subject to.
An important issue in the classic four-stage model is the consistent use of variables affecting demand.
For example, at the end of the assignment stage, new flow levels and therefore new travel times are
obtained. These are unlikely to be the same as the travel times assumed when the trip distribution and
modal split models were run. So this calls for a re-run of these models, but if after that the assignment is
run again, this will again result in a new set of travel times. Trying to solve this problem by repeating
this procedure (iterations) has seen not to be leading to equilibrium when travel times are concerned.
There are methods to find equilibrium in the assignment, which will be discussed in chapter 7. There is a
particular risk in choosing the wrong plan, depending on how many iterations one is prepared to
undertake.
factors such as truck movement, highway geometry and other factors affecting capacity in their
calculations.
q Time of day variations. Traffic varies considerably throughout the day and during the week. The
travel demand forecasts are made on a daily basis for a typical weekday and then converted to peak
hour conditions. Daily trips are multiplied by an "hour adjustment factor", for example 10%, to
convert them to peak hour trips. The number assumed for this factor is very critical. A small
variation, say plus or minus one percent, will make a large difference in the level of congestion that
would be forecast on a network.
q Emphasis on peak hour travel. As described above, forecasts are done for the peak hour on a
typical weekday. A forecast for the peak hour of the day does not provide any information on what is
happening the other 23 hours of the day. The duration of congestion beyond the peak hour, i.e. peak
spreading, is not determined. In addition travel forecasts are made for an 'average weekday'.
Variation in travel by time of year or day of the week is usually not considered.
The data that are generally required for transport studies can be subdivided into the categories of
supply and demand:
Supply data
q Capacity (function of number of lanes or public transport vehicles)
q Design speed
q Type of service provided (e.g. freeway, local road, express-bus service, train service)
q Use restrictions (e.g. turn prohibitions, parking permitted or prohibited, operation only in peak)
q Parking places
Demand data
q Volumes of use by time of day, trip purpose, means of travel and specific location
q Current actual speed (peak and off-peak)
q Costs and times experiences by users, by time of day or by origin-destination locations
q Attributes of users that relate to levels of use and methods of use (e.g. income, age, car ownership,
household size, working status)
It will not be possible to collect all these types of data in just one survey, because of the difference in
survey methods, survey instruments and sampling procedures.
This chapter describes methods by which demand data can be collected. It starts with an introduction to
survey planning. The next section is on sampling methods, followed by a section on errors in data
collection and modelling. The last section is on data collection methods. In the chapters following it is
described how the data will be analysed for usage in the submodels of the traditional four-stage
transport model.
3 1
Based on: P. R. Stopher, ‘Survey and sampling strategies’, In: D.A. Hensher & K.J. Button (eds.) Handbook of
transport modelling, 2000, p. 229-250; M.Taylor, W, Young & P. Bonsall, Understanding traffic systems, Data, analysis
and presentation, 1996, p. 129-156, 247-266; J. de D. Ortúzar & L. G. Willumsen, Modelling Transport, 1994, p. 55-
108
Objectives
At the start of the data collection exercise it is necessary to define the objectives of the survey, as they
can be seen as the starting point of the survey. Questions to be asked to define the objectives are, for
example:
q Is the survey required as part of an ongoing monitoring process or an ad-hoc investigation?
q Are the results supposed to relate to a specific place or are general results sought?
q What hypotheses are to be tested?
q What level of disaggregation is required?
Available resources
Examples of resources are time, people and money. Usually, these resources are a constraint on the
specification of the survey. Compromises often have to be made between what the analyst ideally
wants, and what can actually be afforded. A typical household survey for example may easily go in
terms of hundreds of interviews requiring lots of labour and therefore costs.
Design of sample
The sample design is interrelated with the choice of survey instrument. For example, a certain
instrument might be chosen that needs a minimum amount of observations to reduce measurement
error, which determines the sample size. On the other hand, if the sample requires data recorded every
second, manual measurement techniques are inadequate.
More about sample design is discussed in section 2.2.
Survey plan
The initial survey plan will be based on the decisions taken on survey instrument and sample design. It
will also include operational/procedural aspects such as the recruitment of staff, acquisition of
equipment and the schedule of key events (figure 2.2).
Each survey schedule is unique, and has to be drawn up as a critical path flow chart. The job is about
fitting the required steps around fixed dates and other constraints. These will include the latest date by
which results are required, the window of opportunity for the survey (e.g. in case seasonal factors play
a role) and constraints in resources (staff, equipment). Each survey also has its own additional
constraints.
Although anyone wants time included in schedules for unpredicted circumstances, this will almost
always be impossible due to the compromise that has to be made between the ideal survey plan and
the plan according to which data is delivered at the agreed date.
Pilot survey
The pilot survey is an important element in the survey plan. In the pilot survey the survey instruments
and the associated procedures can be tested. It may be reduced in case standard procedures and
equipment are being used but when innovations are introduced, a pilot survey is vital for the success of
the survey. The piloting can be done at different levels, but in all cases it is necessary to reserve
sufficient time and resources for revision and redesign of the survey plan. In bad cases the pilot survey
may lead to major adjustments to the survey plan, or even abandonment. But of course this is better to
be concluded in this stage of the survey than later.
carefully archived along with other information relevant to conduct of the survey, namely factors that
could have affected the data (e.g. weather conditions).
Next to all these procedures it is also good practice to archive the raw data. This can be useful in case,
for example, the processing was potentially subject to error. This can be quite costly, and mostly there
is put a limit on the storage time of raw data. However, this should not be the case with other elements
of the survey report since they are inevitable for correct interpretation of the results.
Sampling is used when it is not economically (or sometimes technically) feasible to observe an entire
population. However, the problem then arises how to expand the data in the sample to data valid for
the entire population. So there are two difficulties:
q How to ensure a representative sample
q How to extract valid conclusions from a sample
In this section sample design will be explained, along with the description of sampling methods
applicable to transport studies.
Target population
The target population is the population that is of interest for the given study area and from which the
sample has to be drawn. This can be a population that is directly influenced by changes to be made to
the transport system, but it can also be a population outside the area of interest, which will be used as
a comparison to rule out the effects that have nothing to do with the proposed changes.
It should be investigated (e.g. by means of a pilot survey) if there are important subgroups in the
population for which the effects are significantly different from the rest of the population.
Sampling unit
The definition of the sampling unit depends primarily on the nature and purpose of the study, but may
also be constrained by practical considerations involved in collecting the required data. A population
consists of individuals, or individual items, such as persons or vehicles. Sampling units can be
individuals, but also more aggregate units like households, buses or geographical areas.
Sampling frame
The sampling frame is a sort of list, which contains all members of the target population from which in
all cases the actual size of the population can be determined. A sampling frame is for example a list of
all vehicles registered in a certain area.
Sampling method
There are two main methods in sampling: random sampling and judgement sampling. In random
sampling all members of the target population have the same chance to be chosen in the sample.
Judgement sampling uses personal knowledge, expertise and opinion to identify sample members. They
have a certain convenience and can be used in case studies, for example. However, they cannot
represent the target population because they have no statistical meaning. They can be used in pilot
surveys to examine the possible extremes of outcomes using minimal resources.
In random sampling there are four basic methods available:
q Simple random sampling
q Stratified random sampling
q Cluster sampling
q Systematic sampling
These methods will be further described in section 2.2.2
Sampling bias is caused by mistakes in defining the target population, selecting the sampling method or
in any other stage of sample design. There are two differences with sampling error: it affects the mean
value of the estimated parameter as well as the variability around it, but it can be eliminated by being
prudent during sample design stages and data collection.
The two errors described above combined contribute to the measurement error of the data. More about
errors in section 2.3.
Sample size
As seen above, the reliability of a sample increases with the size of it. This means that there has to be a
trade-off, since increasing sample size also implies increasing costs. There has to be found an optimum
sample size at which the reliability and costs are both reasonable.
Systematic sampling
Systematic sample is a simple and convenient method of selecting a pseudo-random sample. At first a
random starting point in a sampling frame is chosen, and from this starting point every nth element in
the sampling frame is selected. For example, when the sample should contain 10% of the population,
the starting point is selected out of the first 10 elements, and after that, every 10th element is selected.
Systematic sampling has two advantages over simple random sampling:
q It is quick and demands only limited resources
q It can easily be applied by unskilled workers
The Central Limit Theorem (CLT, from Statistics theory) postulates that the estimates of the mean tend
to become distributed Normal as the sample size (n) increases. This holds if n>30, or if the population
has a Normal-like distribution.
Consider a (target) population with size N that is distributed with mean µ and variance s 2 . The CLT
states that the mean x of successive samples is distributed Normal with mean µ and standard
deviation se(x ) , standard error of the mean, given by:
If there is only one sample considered, the best estimate of µ is x and the best estimate of s2 is
This is a function of three factors: N, n and S². For large populations, and small sample sizes, which is
mostly the case, the factor (N-n)/N is very close to 1 which reduces the function to:
S
se( x ) =
n
This means that, for example, quadrupling the size of the sample will only halve the standard error. The
required sample size may be estimated now using the last two equations, first calculating n’ from the
last one:
S2
n¢ =
se(x ) 2
Although the above is quite objective, there are two important problems which makes it all less easily.
First the sample variance S² can only be drawn from the sample itself. Therefore it has to be estimated
from other sources.
Second, an acceptable level for the standard error has to be chosen. This is related to the desired
degree of confidence to be associated with the use of the sample mean as an estimate of the population
mean. Confidence is in practice specified as an interval around the mean, therefore there are two
judgements needed to calculate an acceptable standard error:
- A confidence level for the interval must be chosen (a confidence level of 95% means that wrongly
accepting the sample mean as the true mean occurs in 5% of the cases).
- It is necessary to specify the limits of the confidence interval around the mean, either in absolute or
relative terms.
For more information on this subject the reader is referred to statistics theory.
Based on: J. de D. Ortúzar & L.G. Willumsen, Modelling Transport, 1994, p. 58-59
The advance of stratified random sampling is that differences between subgroups of the population can
be recognised, which is not the case with simple random sampling.
In box 2.2 an example is given to calculate the probabilities to find an individual with certain properties
in a sample drawn from a given population using different sampling methods.
Cluster sampling
This method requires division of the target population in clusters from which a random sample will be
drawn. For example, a random number of streets (read: clusters) can be selected from a municipality
for a trip generation survey, and then each household in those streets will be surveyed. This is
convenient in case of the distribution of a mail questionnaire or a household interview survey.
The assumption is made here that there would be no bias in selecting some streets to represent the
municipality. However, there could be a bias due to misrepresenting socio-economic groups, age of
housing, etc.
Both stratified random sampling and cluster sampling divide the target population in well-defined
groups. The difference is that stratified random sampling should be used when each group has small
internal variations, but there is a wide variation between the groups. Cluster sampling should be used
when the groups have a considerable internal variation, but the groups have essentially the same
characteristics.
Assume that for the purposes of a transport study the population of a certain area has been classified
according to two income categories, and that there are only two means of transport available (car and
bus) for the journey to work. Let us also assume that the population distribution is given by:
1. Random sample. If a random sample is taken, it is clear that the same population distribution
would be obtained.
2. Stratified sample. Consider a sample with 75% low income (LI) and 25% high income (HI)
travellers.
From the previous table it is possible to calculate the probability of a low-income traveller using bus, as:
Now, given the fact that the stratified sample has 75% of individuals with low income, the probability of
finding a bus user with low income in the sample is 0.75 x 0.692 = 0.519. Proceeding analogously, the
following table of probabilities for the stratified sample may be build:
3. Choice based sample. Let us assume now that we take a sample of 75% bus users and 25% car
users. In this case, the probability of a bus user having low income may be calculated as:
Therefore, the probability of finding a low-income traveller choosing bus in the sample is 0.75 x 0.75 =
0.563. Proceeding analogously, the following table of probabilities for the choice-based sample may be
build:
Source: J. de D. Ort Ortúzar & L.G. Willumsen, Modelling transport, 1994, p. 62-63
However, it often occurs that these conditions are not satisfied. And even if they were, the problem
remains that model forecasts are usually subject to errors due to inaccuracies in the values of
explanatory variables in the design year.
Models are often built with the aim of forecasting demands in the (near) future. The trade-off has to be
made between model complexity and data accuracy to fit within the required forecasting precision and
the study budget. Two types of errors should therefore be distinguished:
q Errors that could cause even correct models to yield incorrect forecasts
q Errors that actually cause incorrect models to be estimated
The next section explains about different types of errors that could arise during building, calibrating and
forecasting with models. The following section is on the trade-off between model complexity and data
accuracy.
Sampling errors
These errors are, as explained before, due to using a sample instead of the entire population. Increasing
sample size can reduce sampling error. However, quadruple sample size is needed to halve the errors.
Computational errors
Models are generally based on iterative procedures. In case of complex models the exact solution if
often not found due to computational costs. This gives rise to computational errors. In most cases they
are typically small in comparison with other errors, except for cases such as assignment of congested
networks as will be seen later in chapter 7, or equilibration between supply and demand in complete
model systems.
Specification errors
These arise either because the phenomenon being modelled is not well understood or because it needs
to be simplified for whatever reason. Imported subclasses of this type of error are the following:
q Inclusion of an irrelevant variable
q Omission of a relevant variable
q Exclusion of taste variations on the part of the individuals
q Other specifications errors: the use of model forms which are not appropriate (e.g. linear functions
representing non-linear effects)
Increasing model complexity can reduce all specification errors. However, this will be at increasing
costs. It has to be accepted that specification errors may be present in all feasible models.
Transfer errors
These occur when a model developed in one context of time and/or place is applied in a different one.
Adjustments can be made, but it can always be the case that behaviour is different in another context.
In case of a spatial transfer (using the same model in another place), errors can be reduced or
eliminated by partial or complete re-estimation of the model in the new context. Although the latter
would mean that there no advantage in costs of using an existing model.
In case of a temporal transfer (using the same model for future situations), re-estimation is not possible
because of the lack of data, which means that any error must be accepted.
Aggregation errors
These arise out of the need to use groups of people instead of individuals to make forecasts. Behaviour
would have been captured better if modelling were done on the individual level. Important subclasses of
this type of error are the following:
q Data aggregation (this causes some form of specification error because population averages are
used instead of individual values)
q Aggregation of alternatives
q Model aggregation
In box 2.3 the calculation of the influence of errors in input variables on model accuracy is explained.
Complexity can be defined as the increase in the number of variables and/or an increase in the number
of algebraic operations with the variables. It is obvious that the specification error (es) will decrease
with increasing complexity. However, there is more data to be measured, so the measurement error
(em) will increase.
If the total modelling error is defined as E = (es2 + em2 ) , it can be seen that the minimum of E does
not necessarily have to be at the point of maximum complexity. This is shown in figure 2.3. The figure
even shows that with increasing measurement error the total error will only increase more when
complexity increases.
Figure 2.4 illustrates that if data are not of a good quality (which is often the fact in developing or poor
countries), it might be safer to predict with simpler and more robust models. However, better-specified
models are always preferable.
Consider the observed variables x with the associated errors ex (standard deviation). The output error
derived from the propagation of error in a function such as:
z = f ( x1 , x 2 , x3 ,..., x n )
can be found with the following formula:
2
æ ¶f ö 2 ¶f ¶f
e = å çç
2
z
÷÷ e xi + åå e xi e x j rij
i è ¶x i ø i j ¹ i ¶x i ¶x j
In this formula rij is the coefficient of correlation between xi and xj. The formula is exact for linear func-
tions, and a reasonable approximation in other cases.
It is clear from this formula that not using correlated variables can reduce error.
The partial derivative of ez with respect to e xi (ignoring the correlation term) is:
2
¶e z æ ¶f ö e xi
=ç ÷
¶e xi çè ¶xi ÷ø e z
This yields the marginal improvement rate per variable. Along with the estimated marginal costs of en-
hancing data accuracy it should be possible to determine an optimum improvement budget. However, it
is not that simple because these marginal costs are not linear but proportionate to the amount of error
reduction.
From the above derivative two rules can be deduced that can be used in determining which variable
should be improved to achieve the largest reduction of total error:
q Concentrate the improvement effort on those variables with a large error
q Concentrate the effort on the most relevant variables, i.e. those with the largest increase as they
have the largest effect on the dependent variable
Based on: J. de D. Ortúzar & L.G. Willumsen, Modelling Transport, 1994, p. 70-71
Practical limitations
q Length of the study. This implies how much time and money can be devoted to data collection.
q Study horizon. If the design year is close or far away can be conditional on the type of survey that
will be used.
q Limits of the study area. Formal political boundaries (county or district boundaries) should be
ignored, while concentrating on the whole area of interest.
q Study resources. These have to be known in advance to determine which method is to be used. Also
questionnaire respondents must be seen as resources!
The emphasis in this section will be on data collection on the demand side, more specific on O-D travel
surveys. Therefore the study area has to be defined. The external boundary is known as the external
cordon. The area within this cordon has to be divided into (internal) zones. More about how this should
be performed is discussed in chapter 3+1. The area outside the cordon is also divided in (external)
zones, which are substantially larger than the internal zones. Inside the study area can be internal
cordons and screen-lines. A screen-line is an artificial divide following a natural or artificial boundary
with few crossings (e.g. a river or a railway track).
Cross-sectional data are collected at a single point in time while, for example, longitudinal data are
collected at different points in time, using the same sample every time. The methods of data collection
can be the same for both strategies. More about longitudinal data is discussed in section 2.4.5
Revealed preference data are the observed choices and decisions of the travellers, while stated
preference data are collected using the response to hypothetical choices. Of course, the latter can only
be done with interviews or questionnaires. More about stated preference surveys is discussed in section
2.4.4.
General considerations
It is widely recognised that the procedures and measuring instruments used for data collection influence
the data collection results, and should therefore be included in the survey planning process. Of course,
each method has its shortcomings and criticisms. For household-based surveys, some of the frequent
criticisms are:
q The surveys measure average rather than actual travel behaviour of individuals
q Only part of the individual’s movements can be investigated
q Information is often poorly estimated by the interviewee (e.g. travel times)
These criticisms have been analysed which leaded to two conclusions. First, travel behaviour should not
be sought in general terms (averages) but referenced to a temporal point of reference. This has led to
substantial improvement of measurement procedure. Second, the various activities should not be
examined in isolation, but as a complete pattern of activities. For example, asking for starting and
ending times of an activity proved to lead to more accurate result than asking for travel times. This
resulted for example in the travel diary method, which will be discussed below.
Survey date
The date on which the O-D survey should be performed depends on its objectives, but mostly the
objective will be to survey travel behaviour during a working day. These working days can best be
selected in spring or autumn, as summer includes holidays and in winter travel behaviour can be
influenced by climatic conditions.
Survey period
Ideally all households in the selected sample should be interrogated in one single day. In practice this is
done in several days, which reduces the need for interviewers who, on the other hand, become more
experienced in the job. In most cases, the sum of the responses over several working days seems to be
a good representation of the answers that would have been obtained in one single day.
Questionnaire design
The order in which the questions are asked should minimise resistance on the part of the interviewee,
this means that ‘difficult’ questions should be at the end of the interview. Furthermore, the following
aspects should be satisfied in composing the questionnaire or interview:
q The questions should be simple and direct
q The number of open questions should be minimised
q The information about travel must be elicited with reference to the activities which originated the
trips
q Each member of the household older than 12 years old should be personally interviewed. The rest
may be considered letting another member of the household answer for them.
Sample size
Traditionally for household O-S surveys large random samples were used. In table 2.1 the values are
shown that are postulated as recommended practice, but are rarely used. However, if they were used,
particularly in developing countries, the sample sizes were enlarged with 20% to compensate for
validation losses. This was done because the sample sizes were believed as essential.
To reduce the enormous sample sizes, statistics can be used to estimate them (see box 2.1). However,
this requires knowledge about the variable to be estimated, its coefficient of variation, and the desired
accuracy of measurement together with the level of significance associated to it.
The socio-economic information and trip rates by purpose are registered first and used for the
correction process of the O-D survey. After that the data of each trip is more precisely considered and
used for disaggregate choice model estimation.
Workplace surveys
Workplace surveys are very similar with household surveys. The difference is that the data are collected
at the workplace and not at home. This is particularly suitable for corridor-based journey-to-work
studies. The local authority asks a sample of employers in a certain district for permission to interview a
sample of their employees. In some cases it is efficient to ask for the sample of employees to be
distributed by residence. However, it must be noted that the data collected from such a sample is choice
based, and not random as in the household case. Nevertheless, the sample is random with respect to
mode.
The best survey times are, of course, during the normal working hours. Thus, the survey period is
extended considerably with respect to a household survey, which is interesting because the
interviewers’ time is much better used.
Cordon survey
A cordon survey provides information on trips originating in external zones and ending in or passing
through the study area. With this information O-D matrices following from household surveys can be
completed. The external cordon is defined as the boundary of the complete study area. Internal cordons
can also be used. The location of the cordons should be carefully chosen (see figure 2.5). The survey
can be performed using techniques as roadside interviews or registration plate matching, which will be
discussed later.
Screen-line survey
A screen-line survey can be performed using the same methods as the cordon survey. A screen-line is
above defined as an artificial divide. The survey will be performed at the crossings of this screen-line.
The data may be used for filling gaps in and validate data from household and cordon surveys.
However, in correcting household data, care must be taken, because it might not be easy to conduct the
comparison without introducing bias.
Roadside interview
These provide useful information about trips that do not originate in the study area and therefore
cannot be detected in household surveys. They are often a better method for estimating trip-matrices
than home interviews because larger samples are possible. The data can also be useful for validating
and extending household based information.
In roadside interviews a sample of drivers and passengers of vehicles crossing a roadside station are
asked a limited set of questions. These must include at least origin, destination and trip purpose.
Information about age, sex and income is desired but seldom asked due to time limitations. The
experienced interviewer can however collect at least part of these data from observation of the vehicle
and its occupants. A typical roadside interview form can be seen in figure 2.6.
The conduct of a roadside interview survey requires a good deal of organisation and planning to avoid
unnecessary delays, ensure safety and deliver quality results. Important elements in the success of
these surveys are:
q Identification of suitable sites
q Co-ordination with the police
q Arrangements for lighting and supervision
The determination of sample size in roadside interview surveys is discussed in box 2.4.
p (1 - p )
n> 2
æeö p (1 - p )
ç ÷ +
èzø N
Where n is the number of passengers to survey, p is the proportion of trips with a given destination, e is
an acceptable error (expressed as a proportion), z is the standard Normal variate value for the required
confidence level, and N is the population size (i.e. observed passenger flow at a roadside station). It can
be seen that for a given N, e and z, the value p = 0.5 yields the highest (i.e. most conservative) value
for n in the above formula. Taking this value and considering e = 0.1 (i.e. maximum error of 10%) and
z = 1.96 (corresponding to a confidence level of 95%), the values shown the following table are
obtained
On-board survey
These can be seen as the counterpart of the roadside interview for public transport. A surveyor or
interviewer collects data on board a public transport vehicle. This can be done in two ways:
q Participatory surveys. The surveyor interviews the passengers, or hands out survey forms, which
have to be filled out during the trip or later.
q Non-participatory surveys. The surveyor counts the passengers getting on or off the vehicle and the
passengers on board between two stops. If is called a fare-box survey, the surveyor records the
number of fares paid and the number of passes and transfers used, and the data are correlated with
the total fares taken in the fare-box. This is less useful when the majority of the passengers uses
This method is very susceptible to error. If a vehicle passes two observation points and is incorrectly
recorded at one of them, one inbound and one outbound trip will be deduced from the data, instead of
one through trip. The workload of the observers should be reduced in order to get better results.
Reducing the sampling rate and recording only part of the registration plate can do this.
However, with a to low sampling rate certain cells in the matrix will not be observed. And if the records
of the plates are to brief, they lead to spurious matching, which means that vehicles that are recorded
as the same are not in fact the same in reality. To reduce the last type of error it can help if the time at
which the recording of registration plates was performed is recorded.
At present time vehicles become more often equipped with a GPS system that helps them to find their
way through the network. In commercial transport (e.g. cargo, bus companies) it also used as a means
to monitor the location of the company vehicles in the network. It can be said that these vehicles are
electronically tagged and may therefore be used as probe vehicles that can be followed through the
network. This might be a source of O-D and route choice data. However, it will not become an important
source due to privacy concerns.
Headlight survey
This method can be used to observe the patterns of dispersion of vehicles from special events and on
general O-D patterns. It involves placing a sign that asks drivers to put their headlights on and to keep
them lit until they finish their journey or are told otherwise. An observer directly downstream of the site
counts the proportion of vehicles that have indeed put their headlights on. Further on in the network the
vehicles passing with their headlights on are counted. This amount can be multiplied with the inverse of
the earlier observed proportion of vehicles having their headlights on. This gives an estimate of the flow
of vehicles from the original site to downstream sites. The method depends on the assumption that
drivers will not extinguish their headlights before arriving at their destination, and that the pattern is
not distorted by other factors that influence drivers to put on their headlights (e.g. tunnels, bad
weather). Also, there can be vehicles that have their headlights on anyway. Therefore an observer
should be upstream of the sign to calculate the proportion of vehicles with their headlights on already,
so this can be used as a correction.
q Correction by household size. Samples are usually selected from lists of addresses; therefore it is
possible that there are proportionally more bigger-sized households in the sample than smaller
sized if compared to the population. This should be corrected using household size data from the
entire population.
q Socio-demographic correction. This is necessary if differences in distribution of the variables sex and
age are detected between the sample and the population. The definitions of family and household
size should be consistent in both cases. This correction must be done after the correction by
household size.
q Non-response correction. It is possible that there is a variation in travel behaviour between people
who do and do not answer the survey, they obviously travel more. Correction may be possible on
the basis of the number of visits needed to complete the questionnaire at different types of
household. This correction must be performed after the previous two, and may induce significant
changes in the data.
q Correction for non-reported trips. The traditional type of home survey tends to underestimate non-
mandatory trips. The number of trips by purpose of the O-D survey should be checked with those of
the travel diaries. In travel diaries more detailed information of each journey should have been
gathered. For a method to perform the correction see box 2.5.
Sample expansion
After being corrected, the data have to be expanded to represent the total population. An expansion
factor can be defined for each study zone as the ratio between the total amount of addresses in the
zone and the amount of addresses in the sample. There is however a more accurate method, needed
because the information on the total amount of addresses may be outdated. The expansion factor can
be determined using the following formula. In this formula Fi is the expansion factor, A is the total
number of addresses in the population list, B is the total number of addresses selected as the original
sample, C is the number of sampled addresses that were non-eligible in practice (e.g. demolished, non
residential) and D is the number of addresses where no response was obtained.
A - A(C + CD / B ) / B
Fi =
B-C - D
Validation of results
The data obtained in O-D surveys are normally submitted to three validation processes.
• On site checks of completeness and coherence of the data, followed by coding and digitising in the
office.
• Computational check of valid ranges for most variables and in general of the internal consistency of
the data.
• The corrected and expanded survey data are contrasted with information of the traffic counts on
cordons and screen-lines, performed during the O-D survey. This usually presents some practical
problems, as in case of car trips the route choice information is normally lacking.
q Divide the household into categories (say defined by income, number of cars and family
size); the total number of categories is limited by the condition that each one must have at least
30 observations from the travel diary survey (i.e. to ensure that their mean trip rate is distrib-
uted normal).
q Calculate the average number of trips by purpose (and its variance) for each category, and
for both the O-D survey and travel diary data; let the means be X a and X b and the variances
Sa and Sb respectively. Calculate D = Xa - Xb.
q The minimum detectable difference (d) between the means of a certain variable X in two
samples with sizes Na and N b , for an 80% probability of finding that their actual difference
(D) is significant at the 95% level, is given by:
1/ 2
æS S ö
d = 2.8çç a + b ÷÷
è Na Nb ø
q If D>d, the difference is significant; therefore if the average trip rate in that category is
smaller in the O-D survey than in the travel diary, it has to be factored to equal the average trip
rate for the diaries. If the reverse occurs no correction is performed (i.e. the factor is one.
q If D £ d , the difference is not significant and no correction is required.
In practice it is hardly possible to experiment with, for example, a new mode of transport to collect data
on choice behaviour. Instead a stated preference (SP) survey can be performed, which consists of a
quasi-experiment based on hypothetical situations set up by the researcher. Individuals are asked in
interviews or questionnaires what they would choose to do in a hypothetical situation.
A basic problem of SP data collection is if individuals would in reality choose the same option as they
stated they would do. In the 1970s it appeared that only half the people did what they said they would
do. Fortunately this amount has been reduced considerably due to improvements in survey design,
requirements for trained survey staff and quality assurance procedures.
Experimental design
At first a set of hypothetical but realistic alternatives must be constructed. These are called
technological feasible alternatives. There are four tasks in designing these alternatives:
q The identification of the range of choices (e.g. car and rail or different types of service within a
mode)
q The selection of attributes included in each option (e.g. travel time, cost, waiting time)
q The selection of the measurement unit for each attribute
q The specification of number and magnitude of attribute levels
It should always be kept in mind that the alternatives should be designed in a way that it ensures
realistic response.
The attribute combinations presented in the alternative are usually independent from one another. This
implies that the number of alternatives would be n a , where a is the number of attributes and n is the
number of levels they can take. If this number of alternatives is used it is called a full factorial design.
However, when a or n increase, the number of alternatives increases exponentially which will induce
fatigue in the respondent and reduce the value of the responses. Therefore a method called fractional
factorial design is used, which excludes part of the alternatives (on certain grounds) at the cost of being
unable to recover one or more interaction effects. The complexity of the design should however be
maintained so that there are up to three simultaneous changes in the alternatives. This has been
proven to result in the most reliable answers.
The way in which the attributes are presented must be similar with the way they are perceived by the
traveller. This can range from pictures of means of public transport to the way people perceive
“frequency”. In the latter case it has been shown that nobody thinks in trains per hour or per day, but
rather in, for example, the waiting time at the station until departure. Research has to be done before
starting the survey to ascertain in which way a certain attribute is perceived.
Questionnaire design
There are three main ways of collecting information on preferences about alternatives:
q Ranking responses. All alternatives are presented at once, and should be put in order of preference
by the respondent. It limits the number of alternatives to be used without inducing fatigue.
Furthermore it could be that the ranking can be seen as judgement by respondents which does not
necessarily correspond to the type of choices they face in real life.
q Rating techniques. These are widely used in market research. The respondents are asked to express
their degree of preference for an option using an arbitrary scale. Usually this is a scale between 1
and 10, where 1 = ‘strong dislike’, 5 = ‘indifference’ and 10 = ‘strong preference’.
q Choice experiments. The respondents have to choose between two options. Instead of this binary
method there is also a rating technique possible. The degree of preference can be expressed on a 5-
point scale: ‘definitely choose A’, ‘probably choose A’, ‘cannot choose’, ‘probably choose B’, and
‘definitely choose B’.
Sampling strategy
SP surveys are statistically efficient, because each interviewee produces not just one observation but
several. Samples therefore, can be typically smaller than is the case with revealed preference surveys.
However, if each interview results in 10 responses on 10 hypothetical choices, this provides information
about the variation within the individual but not between the individuals. For a representative model
both kinds of information are needed, and only an adequately sized and representative sample can do
this.
To forecast demand it is thus necessary to survey many types of individuals in order to obtain
representative results. Otherwise large samples would be needed to achieve enough observations on
minority choices. Using a choice based sample may be very cost-efficient in this case, but it may induce
additional bias because of the different ways different individuals perceive the choice context.
In defining the network and study area there will have to be made a compromise between the level of
detail (accuracy) and cost.
When the study area is known, a zoning system must be defined. In a zoning system the individual
households and premises in a study area are aggregated into manageable chunks so they can be used
in a model. These chunks are called internal zones. In the area outside the study area external zones
are defined, which are usually larger and are used to model the traffic entering, leaving or passing
through the study area. However the latter is often omitted. Traffic between zones is called interzonal
traffic. Traffic within zones intrazonal traffic. Of course the volume of intrazonal traffic increases with
increasing zonal size.
Zones vary in size, with the smallest about the size of a block in the downtown area (Central Business
District = CBD), whereas the largest on the urban fringe may be several square kilometres in area. An
area with a million people could have 700 to 800 zones. The study-area is accordingly divided into
zones. In general the following factors are considered relevant in the design of a zoning system for a
study area (Black, 1981):
q zones should contain distinctive land-use patterns such as residential or industrial use
q characteristics of the activities within a zone should be as homogeneous as possible so that derived
zonal means are representative of activity in the whole zone
q the zone system should conform to census (survey) collection areas (e.g. postal codes)
q zonal boundaries need to follow, where possible, rivers and other physical barriers to movement.
These zones are represented in the models as if all their characteristics (attributes) and properties were
concentrated in a single point called the zone centroid, connected to the road network with a centroid
connector, or zone connector.
4
Based on: J de D. Ortúzar & L.G. Willumsen, Modelling transport, 1994, p. 102-108
3.2.1 Schematisation
A transport network may be represented at different levels of aggregation. The highest level of
aggregation, for example, assumes a continuous equation of the average traffic capacity per unit of
area. Mostly, however, discrete elements are used which are called links.
Normal practice is to model the network as a directed graph, which is a system of nodes and links
joining them. The nodes are to represent junctions, and the links to represent homogeneous stretches
of road between two junctions. There can be virtual nodes needed if the attributes of a link change
elsewhere than on a junction. The links have attributes like length, speed, number of lanes etc. Links
have to be unidirectional so two-way links will be split up in two one way links. The zone centroids are
represented as nodes, and the centroid (or zone) connectors as links. See figure 3.1.
A problem in this network schematisation is that connectivity to each link joining a specific node is
offered at no cost, while some turning movements are more difficult to perform than others. One might
have to wait for a long time to turn left (or right, for countries driving left) for example. The turning
movements can be banned or penalised. This can be done manually by introducing dummy links with a
certain cost for each turning movement, or semi-automatic by an advanced computer program.
One of the key decisions is how many levels of the road hierarchy should be represented in the model.
Generally, only the roads that have a traffic-flow or access function will be represented. It is also said
that one should include at least one level of roads lower than the level of interest.
For public transport modelling special routes can be modelled with attributes as frequency, capacity and
travel times. This will not be discussed here.
These are not all attributes that can influence a drivers choice for a specific route, there have also been
find others like toll or the scenic quality of a route. These are attributes depending on the location, and
should be included when this is assumed necessary.
When travel time is modelled as a function of flow there are two different cases:
q Delay on a link is assumed to depend only on the flow on the link itself (inter-urban area)
q Delay on a link depends in an important way on flow on other links (urban area)
When facing the issue of equilibration of supply and demand the first case is easier than the second.
However, there are techniques to balance demand and supply in the case of link-delay models
depending on flows on several links.
4.1 Introduction
In this chapter the stage of trip generation modelling will be discussed. After some introductory notes
three methods of trip generation modelling will be explained: growth factor modelling, regression
analysis and category analysis.
INPUT OUTPUT
Trip production is largely dependent on the characteristics of the household (income, household
structure, mode availability), the characteristics of the zone (land-use, housing density,
industrialisation) and the accessibility of the zone (quality and quantity of transport possibilities
(infrastructure and modes)).
q Trip attraction model – the total number of trips attracted to a zone, irrespective of their origin is
derived
Trip attraction is largely dependent on job availability, land-use (for industry, education, shops, health
care, banks, governmental offices, recreation, airports, harbour etc.)
The terms origin and destination do not always have the same meaning as production and attraction. In
home-based trips the home of the trip-maker is always the trip production. With non home-based trips
the trip is always produced at the origin. The zone of a non-home activity for a home-based trip, or the
destination zone for a non home-based always attracts a trip.
This classification can be done by trip purpose, time of day, person type of modal choice.
By trip purpose
There are five categories that are usually employed:
q Trips to work
By time of day
Trips are usually classified in peak and off-peak trips. There are significant differences between these
periods, for example in trip purpose. There is also a difference between the morning and the evening
peak.
By person type
Travel behaviour is largely dependent on socio-economic attributes, so the classification by person type
is important. The following (stratified) categories are usually employed:
q Income level
q Car ownership
q Household size
By modal choice
The trips can be classified by modal choice, like car, train, bus or bicycle. Of course the availability of a
certain mode plays a role in this.
where Ti and ti are respectively future and current trips in zone i and Fi is a growth factor.
The factor Fi should be estimated. Usually it is related to population (P), income (I) and car ownership
(C), in a function such as:
d d d
f ( Pi , I i , C i )
Fi =
f ( Pi c , I ic , C ic )
where f is a function of the given variables, and d and c denote the design and current years
respectively.
Although easy to use, the growth-factor method is subject to large error, as can be seen in the example
in box 4.1. Therefore it is only used in practice to predict the future number of external trips to an area.
They are not too many and there are no simple ways to predict them otherwise.
Consider a zone with 250 households with car and 250 households without car. Assuming we know the
average trip generation rates of each group:
we can easily deduce that the current number of trips per day is:
Let us also assume that in the future all households will have a car; therefore, assuming that income
and population remain constant (which is a safe hypothesis in the absence of other information), we
can estimate a simple multiplicative growth factor as:
d c
Fi = C i / C i = 1 / 0.5 = 2
However the method is very crude, as we will demonstrate. If we use our information about average
trip rates and make the assumption that these will remain constant, we can estimate the future num-
ber of trips as:
which means that the growth factor method would overestimate the number of trips of approximately
42%. This is very serious because trip generation is the first stage of the modelling process; errors
here are carried through the entire process and may invalidate work on subsequent stages.
where Y for example are the trips/household, X1 the car ownership, X2 the family income and X3 the
family size in case of a production equation. The A’s and B’s are coefficients determined through
multiple linear regression. For background on regression analysis the reader is referred to statistics
theory. Note that it is important to have at least some basic knowledge on this subject to be able to
judge the results of a regression exercise.
The model parameters are established using base-year data. Once the equations are calibrated, they
are used to estimate future travel for a target year. A goodness-of-fit (or coefficient of multiple
regression) R2 may be used to find the quality of fit of the calibrated equation to the data. R being a
value between 0 and 1. The closer R is to 1, the better is the linear relationship between the variables.
5
More information on multiple regression analysis can be found in any textbook on statistics, e.g. Alan Field’s
Discovering Statistics using SPSS – Chapter 5.
Question: Suppose we have Y=50.5 + 0.80X1+1.75X3, with a R2=0.34. Should you use this trip
generation model?6
Recall that residential land-use can be seen as an important trip generator (producer). Non-residential
land-use (shopping, industry etc.) in many cases is a good attractor of trips.
To calculate the total number of trips produced in a zone Oi (the total number of origins Oi, trips
departing zone i) having a regression model on household level7 we have to sum all the trips produced
in the households by doing ShOh in case of:
with Eh number of employed residents in household h, Ch the number of cars available to household h.
Besides this we might have an equation for the trips attracted (the total number of destinations Dj, trips
arriving in zone j) to the zones (in this case at zonal level already!):
where Ez is the employment in zone z and Rz the retail floor space in the zone in m2.
In table 4.1 an example of some trip generation rates taken from the American Institute of
Transportation Engineers handbook on Trip Generation are given. In this handbook (3 volumes) trip
generation rates for all kinds of land-uses are given.
In this method the cell rates are computed per purpose group by dividing the total number of trips in a
cell h, by purpose p, by the number of households H(h) in it, as follows:
6
The R2 is far from 1, and therefore apparently the used variables are not representative enough for estimating the
trip generation in the area. Incorporating some other variables might solve the problem. Another reason might be
the sample-size. The number of family’s or company’s interviewed can be too small to aggregate.
7
Sometimes also zonal regression models exist which calculate the total number of trips originating in a zone as a
function of for example population Pi, number of houses Hi and cars Ci in a zone like a study in Toronto
Oi=0.351Pi+0.145Hi-0.253Ci. Explain the minus sign for Ci!
H ( h)
where tp(h) is now the average number of trips with purpose p (and at a certain time period) made by
members of households of type h. Types are defined by the stratification chosen: for example, a cross-
classification based on m household sizes and n car ownership classes will yield mn types h.
It has been shown the trip production of households is largely dependent on the car ownership and
household size. Suppose a division is made in 3 classes of car ownership and 4 classes of household
size. The number of trips per household is now determined as:
Alike the regression models the category method also allows for aggregation by multiplying the trip
rates with the number of households of type h in zone i. If Hn(h) is the set of households of type h
containing persons of type n, then the total trip productions with purpose p by person type n in zone I,
is as follows:
Oinp = å a (h)t i
p
( h)
hÎH n ( h )
4.4 Balancing
Since trips, which originate somewhere, always have to end somewhere there is always the condition
that Oi and Dj are equal (in balance). Unfortunately since all input data normally comes from different
sources (production data are socio-economic data from household interviews, and attraction data from
aggregated statistical sources on land-use) this is seldom the case (SiOi¹SjDj). Normally, production data
are more accurate and therefore zonal attractions are normally multiplied with a ‘correction’ (balancing)
factor f:
i
åO i
f = i =1
J
åD
j =1
j
5.1 Introduction
In this chapter the stage of trip distribution modelling will be discussed. Since we know for every zone
in our study area the total number of trips originating and departing (Oi and Dj), we are now interested
where those productions go to, and where the attractions come from. The outcome is of course the
origin-destination trip table.
INPUT OUTPUT
An important concept used in this stage is the so-called impedance. Impedance can represent travel
time, cost, distance, or a combination of factors. Generally, impedance is the weighted sum of various
types of times and types of cost. Therefore it is also called generalised costs, as an equation:
Cij
cij = min (Tij + )
r g
where the minimum generalised cost on a trip from i to j are a combination of the travel time Tij plus
the money-costs Cij, translated in time using the value-of-time g (in Netherlands for example something
between 5 (recreational) and 20 (business) Euro per hour).
Arrivals
Departures Sj Tij
1 2 j n
1 T11 T12 T1n O1
2 T21 T22 T2n O2
…
i Tij Oi
m Tml Tm2 Tmn Om
Si Tij D1 D2 Dj Dn T
Table 5.1 Standard OD trip matrix
In formula in the uniform growth factor method each cell of the OD table is multiplied with the general
growth rate:
Tij = ttij "i, j
where Tij are the ‘updated’ future trips, and tij the ‘old’ base-year trips.
Arrivals Sj Tij
Departures
1 2 3 4
1 5 50 100 200 355
2 50 5 100 300 455
3 50 100 5 100 255
4 100 200 250 20 570
Si Tij 205 355 455 620 1635
Table 5.2 Base-year trip matrix
Question: Calculate the future matrix according to the uniform growth factor method, assuming no
other information is available to you (t=1.2)9.
Suppose that for the origins the growth is already predicted, which is displayed in the column ‘Target
Oi’.
Arrivals Sj Tij Target
Departures
1 2 3 4 Oi
1 5 50 100 200 355 400
2 50 5 100 300 455 460
3 50 100 5 100 255 400
8
Example taken from (Ortúzar & Willumsen, 1994).
9
Just multiply each cell with the growth factor 1.2, for example cell (1,1) becomes 6, the row total ∑jT1j (total trip
production of row 1) becomes 6+60+120+240 = 426 trips.
By multiplying each row by the ratio t=(Target Oi)/Sj the resulting, updated OD table is derived.
You might wonder for example whether you can take the average growth factor (ti + Gj)/2?10
Several methods to solve this exist, but the best one proven is the so-called Furness method using
balancing factors. This method will also be useful when discussing the gravity model.
These factors ai and bj have to be recalculated until SjTij=Oi and SiTij=Dj, according to iterative process
called the Furness method:
set all bj=1.0 and solve for ai (like in the singly constrained case)
with the latest ai, solve for bj, i.e. satisfy the attraction constriant
keeping bj fixed, solve for ai and repeat 2) and 3) until changes are sufficiently small, like 5% difference
between target and estimated value.
This method is also called bi-proportional fitting of the base-year data to expected future data.
An example will make clear that the theory of this method is much more difficult than the ‘doing’.
10
If you try this you will notice that the average growth factor is not converging to your target productions and
attractions.
11
Remember from previous paragraph that the total productions and attractions had to be balanced, otherwise we
would have had a problem at this stage, with a different row and column total!
UNIVERSITY OF TWENTE, THE NETHERLANDS 55
First start with calculating the correction factors ai by dividing the target Oi’s by the Sj’s, deriving next
table;
Step 1:
Arrivals Sj Tij Target ai
Departures Oi
1 2 3 4
1 5 50 100 200 355 400 1.13
2 50 5 100 300 455 460 1.01
3 50 100 5 100 255 400 1.57
4 100 200 250 20 570 702 1.23
Si Tij 205 355 455 620 1635
Target Dj 260 400 500 802 1962
Bj 1 1 1 1
Step 2:
Accordingly ‘correct’ all rows with the ai’s, as well as deriving the new bj’s, resulting in:
Arrivals Sj Tij Target ai
Departures Oi
1 2 3 4
1 5.63 56.34 112.68 225.35 400.00 400 1
2 50.55 5.05 101.10 303.30 460.00 460 1
3 78.43 156.86 7.84 156.86 400.00 400 1
4 123.16 246.32 307.89 24.63 702.00 702 1
Si Tij 257.77 464.57 529.51 710.14
Target Dj 260 400 500 802 1962
Bj 1.01 0.86 0.94 1.13
Step 3:
The biggest difference in total and the target value is 14% (0.86), so the next step is to multiply all
columns with the bj’s, obtaining:
Arrivals Sj Tij Target ai
Departures Oi
1 2 3 4
1 5.68 48.51 106.40 254.50 415.09 400 0.96
2 50.99 4.35 95.46 342.53 493.33 460 0.93
3 79.11 135.06 7.41 177.15 398.73 400 1.00
4 124.22 212.08 290.73 27.82 654.85 702 1.07
Si Tij 260.00 400.00 500.00 802.00
Target Dj 260 400 500 802
Bj 1 1 1 1
The biggest difference now being 7% (1.07).
The solution of this problem, after three iterations on rows and columns (three sets of corrections for all
rows and three for all columns), can be shown to be12.
Arrivals Sj Tij Target
Departures
1 2 3 4 Oi
1 5.25 44.12 98.24 254.25 401.85 400
2 45.30 3.81 84.78 329.11 462.99 460
3 77.04 129.50 7.21 186.58 400.34 400
4 132.41 222.57 309.77 32.07 696.82 702
Si Tij 260 400 500 802 1962
Target Dj 260 400 500 802 1962
Table 5.6 Expanded doubly-constrained growth trip matrix
12
Try this out yourself. This is easiest using some kind of spreadsheet programme like MS-Excel.
Check the accuracy is now less than the target accuracy of 5%!
In general, if the generalised costs of making a trip increase, the ‘likeliness’ f(cij) of making that trip
decreases, which sounds quite logical. An important type of distribution function is the negative
exponential function:
f (cij ) = exp(- bcij )
with b a calibrated coefficient, with a value around 0.05. Another one is the power function:
-a
f (cij ) = cij
These functions don’t represent reality too good, for example with the use of the car, we normally
expect that for the shorter distances the distribution value is small and increases only after cij has a
moderate value up to a certain maximum after which it behaves again as an exponential function
(decreasing with increasing costs).
To solve this problem the exponential distribution function is usually combined with the power function:
f (cij ) = ca ij exp(- bcij )
with a=0.5 and b=0.12. The power function, negative exponential and combined function are shown in
figure 5.1.
This distribution function is now used in the gravity model. The gravity model is based on Newton’s
gravity law. Trip making behaviour is influenced by external factors such as total trip ends (Oi and Dj)
and distance travelled (distribution function). In its simplest formulation it states that the number of
Pi Pj
Tij = a
d ij2
trips between two zones is the resultant of both populations (P) divided by the distance (d) between the
two zones squared;
In practice the gravity model is applied to the singly (only the productions or attractions are known for
the future) and doubly (both the productions and attractions are known for the future) constrained
case. Again the balancing factors Ai and Bj are introduced (to replace a (or µ)) and are grouped again to
form ai and bj.
Tij = ai b j f (cij )
which is the same as with the growth factor, but with the base-year table tij replaced with the initial
impedance’s14 (function of generalised costs) of travelling between each origin and destination.
An example is given using the negative exponential function as distribution function f(cij)=exp(-bcij).
First we want to know the initial ‘costs’ (in time and/or money) of travelling between each O and D as
well as the future trip-ends (target Oi and Dj). Costs: cij
From this table we derive the ‘likeliness’ that someone is making that trip. For example it seems
unlikely that many people will go from zone 4 to zone 1, since the cost are very high: 24. Maybe
because there is a canal between zone 4 and 1, making the trip very lengthy by forcing people to make
a big detour.
13
Intuitively you must agree that an increase in resistance from 5 to 10 minutes has a larger impact on once
destination choice than for the same absolute increase in travel time from 120 to 125 minutes. Therefore the
negative exponential function shall only fit empirical data to a certain extend.
14
Most textbooks on transport modelling give different names for the resistance of travelling: impedance, friction,
generalised costs etc. Also distribution function, friction function etc.
Arrivals Sj Tij
Departures
1 2 3 4
1 0.74 0.33 0.17 0.11 1.35
2 0.30 0.74 0.30 0.15 1.49
3 0.21 0.27 0.61 0.50 1.59
4 0.09 0.17 0.45 0.61 1.31
Si Tij 1.34 1.51 1.52 1.36 5.74
Table 5.8 Matrix exp(-bcij) and sums to prepare for a gravity model run
This table together with the target values forms the basis for the Furness problem previously discussed.
Arrivals Sj Tij
Departures
1 2 3 4
1 155.37 99.00 64.46 74.17 393.36
2 57.54 200.22 106.73 90.98 455.56
3 25.87 47.01 137.16 192.77 402.81
4 20.8615 53.77 191.65 444.08 710.37
Si Tij 260 400 500 802 1962
Table 5.10 The resulting gravity model matrix
This OD table is very important, because at stage we know exactly how many trips are produced in and
attracted to every zone, but we also know how many trips go from one zone to another.
A weakness of this OD table is that the result is based on the use of a distribution function that is using
unimodal travel cost cij, whereas you should expect that different modes have different distribution
effects. To cope with this some authors have proposed using the minimum cost of traversing a certain
pair ij (min{cijm}), or using the average costs for cij, like depicted in the following formula:
åc m
ij
cij = m
m
Or, a weighted average cost function, where bijm is the mode choice proportion.
15
Indeed a small number of trips go from zone 4 to zone 1! Imagine now what would have happened if the connection
between zone 4 and zone 1 was made ‘cheaper’ (in length and/or time) by the construction of a tunnel.
UNIVERSITY OF TWENTE, THE NETHERLANDS 59
5.3 Tri-proportional fitting
In sections 5.1 and 5.2 the method of bi-proportional fitting was used to solve doubly-constrained
problems. The constraints concerned the total of origins and destinations in a zone. However, a third
constraint, concerning trip length distribution, can be introduced.
The trip length can also be seen as the impedance, or cost used in the gravity model. Instead of trip
length ranges, cost-bins will be used.
The following example will illustrate the tri-proportional fitting method. Therefore the same problem as
in section 5.2 will be used. The model can be solved using Furness iterations as shown before, however
now three steps have to be taken in each iteration instead of two.
The balancing factors are now ai, bj and Fk. The target ranges are given in table 5.11, the resulting
matrix after iterations in table 5.12.
Ranges (cost-bins)
1.0-4.0 4.1-8.0 8.1-12.0 12.1-16.0 16.1-20.0 20.1-24+
Number of trips 365 962 160 150 230 95
Table 5.11 Target values trip length distribution for tri-proportional gravity model calibration
Ranges (cost-bins)
1.0-4.0 4.1-8.0 8.1-12.0 12.1-16.0 16.1-20.0 20.1-24+
Number of trips 365 962 160 150 230 95
Tk 360.9 966.5 159.0 149.8 230.3 95.5
Fk 224.55 220.13 87.54 102.05 54.66 34.90
Table 5.12 The resulting matrix after five complete iterations
External trips
Trips with one end outside the study area are not considered in the synthetic gravity model. It is
impossible as the distance or cost of the external trips is unknown.
The practice is to exclude external trips from the synthetic modelling process. Roadside interviews at
cordon points will lead to the desired information on external-external and external-internal trips. These
data can then be updated using the Furness growth factor method. A number of the trip ends from the
trip attraction models correspond to the external-internal trips and have to be subtracted from the trip
end totals to be used as constraints.
Intrazonal trips
To estimate the number of intrazonal trips cost values are given to centroid connectors, which is a crude
method but a necessary approximation in some cases. However it is preferable to remove intrazonal
trips from the synthetic modelling process and forecast them for example as a fixed proportion of the
trip-ends.
Intrazonal trips do not use the network modelled so it is less essential to model them in an accurate
way. If a coarse zoning system is used, the problem can however be significative.
Productions-Attractions, Origins-Destinations
To assign a trip matrix onto a network it is necessary that it has the shape of an origin-destination
matrix. However, in synthetic modelling, the trip productions and attractions are used. As we have seen
before, in home based trips the home end is always the production end, which would in case of a trip to
work and home again lead to two trips with the same production and attraction end, but with different
origin and destination.
In case of a 24-hour trip matrix the OD-matrix is almost the same as the production/attraction matrix,
as it is assumed that each produced trip is made once in each direction every day.
In case of a shorter time period this will be different. It is not sure in which direction a trip will take
place. To solve this there are two different approaches:
Produce a matrix for a single purpose, typically ‘to work’, and assume that these follow just one
direction of travel (production-attraction during morning peak). Some corrections have to be made for
shift work, flexible working hours etc.
Use survey data to determine the proportion of the matrices for each purpose that will fit within the
time of day, for example 70% production-attraction and 30% attraction-production trips.
6.1 Introduction
In this chapter modal split modelling will be discussed. At this stage in the modelling process we have
information on the number of trips between every origin and destination in the study-area. These are
person-trips, and not vehicle-trips, since, we don’t know who is cycling, walking, taking a car, public
transport etc. Therefore we need to predict the mode use, the so-called modal split.
INPUT OUTPUT
The factors influencing the choice of mode may be classified into three groups (ú & Willumsen, 1994):
1. Characteristics of the trip maker. The following features are generally believed to be important:
q car availability and/or ownership
q possession of a driving license
q household structure (young couple, couple with children, retired, single etc.)
q income
q decisions made elsewhere, for example the need to use a car at work, take children to school,
etc.
q residential density
2. Characteristics of the journey. Mode choice is strongly influenced by:
q the trip purpose; for example, the journey to work is normally easier to undertake by public
transport than other journeys because of its regularity and the adjustment possible in the long
run
q time of the day when the journey is undertaken. Late trips are more difficult to accommodate
by public transport
3. Characteristics of the transport facility. These can be divided into two categories.
a. quantitative factors such as:
q relative travel time, in-vehicle, waiting and walking times by each mode
q relative monetary costs (fares, fuel and direct costs)
q availability and cost of parking
b. qualitative factors which are less easy to measure:
q comfort and convenience
q reliability and regularity
q protection, security
depending on each mode’s relative desirability for any given trip. Modes are said to be relatively more
desirable if they are faster, cheaper, or have other more favourable features than competitive models.
The first models included only one or two characteristics of the journey, typically (in-vehicle) travel
time. It was observed that an S-shaped curve seemed to represent this kind of behaviour better. In the
curve the proportion of trips by mode 1 (T’ij/Tij) against the cost or time difference is given.
Apart from the use of the logit curve there exists the sequential mode choice model using the theory of
utility as discussed in Intermezzo II. The better a mode is the more utility is has for the potential
traveller. The logit model takes the following form to trade off the relative utilities of various modes:
probability of using mode i, Pi, is given by:
eU (i )
Pi = n
åe
r =1
U (r )
, where U(i) is the utility of mode i, U(r) is the utility of mode r and n the number of modes in
consideration. This model is also called a multinomial logit model (or binomial logit model if only two
modes (often car versus public transport) are considered).
An example in box 6.1 will show the use of a simple binomial logit model.
What is the probability that a person from a certain zone with an average income per inhabitant of USD
10,000 will travel by public transport?
eU PT e -2.6
PPT = = = 0.23 , or 23%, so Pcar=1-PPT=77%
eU PT + eU car e - 2.6 + e -1.4
If now for example the OD table for this zone gives 620 trips to a certain other zone, determine the
number of trips by each mode.
Normally, after we know the mode-split for the different zones of our study-area the OD trip table (from
the trip distribution step) is transformed into n OD mode specific trip tables, using the modal-split and
the vehicle-occupancy. Using vehicle-occupancy factors we can translate 300 car trips in a certain zone
into the number of vehicles we actually will find on the road! For example, assuming a vehicle-occupancy
factor of 1.3 gives 231 vehicle-trips. For bicycle and pedestrian this factor is of course 1.0. The factor is
normally not used for public transport, since occupancy-factors are route- and time depended.
The simultaneous trip distribution/modal split model incorporates mode-specific distribution functions
fm(cijm ) in the gravity model:
Tij = å Tijm = ai b j å f m (cijm )
"m "m
, which means that the distribution of trips over the different modes m between every origin O and
destination D is proportional with the mode-specific distribution function values fm(cijm ).
The only thing that is left for analysis is the actual number of vehicles we will encounter on the roads.
This is the last step route assignment, where we translate the OD tables into vehicles on the different
routes in the network.
16
Why are all signs negative?
7.1 Introduction
In this chapter traffic assignment modelling will be discussed.
INPUT OUTPUT
In the assignment procedure the transport engineer or planner predicts the routes the vehicle-trips
estimated in the previous steps will take and assigns them to the different links. For example, if a trip
goes from a suburb to downtown, the model predicts the specific streets or public transport routes to be
used. The trip assignment process begins by constructing a map representing the vehicle and public
transport networks in the study area. These maps show the possible routes that trips can take.
The intersections, called nodes on the network map are identified, so that the sections between them,
called links can be identified. After the links are identified by nodes, the length, type of facility, location
in the area, number of lanes, speed, maintenance condition and travel time are identified for each link.
For public transport extra information on routes, frequencies, headway’s etc is necessary. This
information allows us (in practice the computer) the routes that travellers might take between any two
points (nodes) on the network and to assign trips between zones (represented by the centroid with its
centroid connector to the network) to these routes.
The output of trip assignment shows the routes that all trips will take, and therefore the number of cars
on each roadway (link’s) and the number of passengers on each public transport route.
With this, and the use of the previous step the planner can obtain realistic information/estimates of the
effects of policies and programs on travel demand (for example the introduction of a tunnel connection
between two zones, which are separated through the canal). The planner can assess the performance of
alternative transport systems and identify various impacts that the system will have on the urban area,
such as energy use, pollution and accidents. With this information on how transport systems perform
and the magnitude of their impacts, the planners can provide decision makers (politicians) with some of
the information they need to evaluate alternative methods of supplying the community with transport
services.
The assignment models discussed in this chapter make use of shortest-path algorithms to obtain the
shortest path between each origin and destination. Only one of the methods, Dijkstra Algorithm, will be
discussed. This is the one mostly used.
Starting from s we ‘walk’ through the network labelling each node u with the label L(u). L(u) indicates
the length of the temporary shortest path of s to u. So, it can still be changed with a shorter one, until
no other node can be found which is decreasing L(u) . The label becomes definitive.
Let:
V seto f odes
T seto f nodes with temporary label
s the origin node
t the destination node
u, v a node
e a link
The algorithm:
Step 1: L(s): = 0 and for all veV, v¹s: L(v) = ¥
Step 2: T: = V
Step 3: let ueT for which L(u) is MIN: do
If L(u) = ¥ then STOP; no solution
If u=t then T:=T-{u} and STOP (L(t) is shortest path from s to t)
Step 4: each link e from u to veT: do
If L(v)>L(u)+length(e) then
L(v):=L(u)+length(e)
Step 5: T:=T-{u}: return to step 3
4
B C
2 2 3
6 D 4
A F
2
3
7
E
Assuming a direct graph (one way traffic from West to East), find the shortest path from A -> F. try to
reconstruct the table below yourself.
A B C D E F T
0 ¥ ¥ ¥ ¥ ¥ {A,B,C,D,E,F}
0 2 ¥ 6 7 ¥ {B,C,D,E,F}
0 2 6 4 7 ¥ {C,D,E,F}
0 2 6 4 6 8 {C,E,F}
0 2 6 4 6 8 {E,F}
0 2 6 4 6 8 {F}
The shortest path A-F has a length of 8 units. By recalling the predecessor nodes (the label which made
it definitive) the route can be reconstructed backward. A-B-D-F.
In static traffic assignment models the demand for transport as well as the supply of the infrastructure
network are regarded independent of time. The demand is homogeneously spread over the routes (in
space and time), therefore they are also called steady-state models. In this course we stick to static
assignment models.
Within static assignment models a differentiation can be made in deterministic or stochastic assignment
models. In the case of deterministic traffic assignment models all travellers are assumed to have the
same perfect knowledge and act on it rationally and identically. Stochastic models on the other hand
allow for imperfect knowledge and taste variation, and are therefore more realistic (but also more
complicated).
Within these methods two other distinctions can be made, i.e. one for uncongested assignment (the so-
called all-or-nothing method if deterministic, or stochastic assignment if stochastic) and one for
congested assignment (the so-called user-equilibrium method if deterministic or stochastic user
equilibrium method if stochastic).
The famous Bureau of Public Roads (BPR) method is often used for link loading, i.e. for travel time vs.
flow:
b
é æ V ö ù
TQ = T0 ê1 + a çç ÷÷ ú
êë è Vmax ø úû
where
TQ = travel time at traffic flow Q
T0 = free-flow travel time
V = traffic flow, volume (veh/hr)
Vmax = saturation flow
a, b = parameters
For example suppose on a certain stretch of highway (1 km) the free-flow travel time is 0.6 minutes,
a=0.15, b=4 the volume is 2400 veh/hr, whereas the saturation flow is at 2100 veh/hr. The travel time
increases from 0.6 min. during “normal” circumstances to 0.75 minutes, which is about 80 km/hr.
A number of factors are thought to influence the choice of route when driving between two points;
these include journey time, distance, monetary cost (fuel and other), congestion, type of road, scenery,
signposting etc. The production of a generalised cost expression should enable you to incorporate these
factors; though in practice only time and monetary costs are used.
The assignment algorithm itself is the procedure that loads the OD trip table (after modal split) to the
shortest path trees and produces flows VA,B (between nodes A and B).
An example taken from (Ortúzar & Willumsen, 1994): Consider the simple network in figure 7.2a and its
associated simplified OD table:
From/To A B C D
A 0 0 400 200
B 0 0 300 100
C 0 0 0 0
D 0 0 0 0
Section a) shows the travel costs (times) on each link; section b) the corresponding trees based on
these costs together with the contributions to the total flow after assignment; these are shown in
section c).
All-or-nothing assignment is generally of limited interest to the planner; it may be used to represent
some sort of ‘desire line’, i.e. what drivers would like to do in the absence of congestion. However, its
most important practical feature is as a basic building block for other types of assignment techniques,
a/o. the user-equilibrium assignment discussed next.
If all trip makers perceive costs in the same way: Under equilibrium conditions traffic arranges itself in
congested networks such that all used routes between an O-D pair have equal and minimum costs while
all unused routes have greater or equal costs.
Assume as an example a town served by a bypass, with an idealised town with a low-capacity through
route (1000 veh/hr) and a high-capacity bypass (capacity 3000 veh/hr), which is longer but much
faster. Total travel demand is given as V=2000 vehicles.
Figure 7.2. A simple network, its trees and flows from loading a trip table
Town centre
Figure 7.3 Town served by a bypass and a town centre route
Assume that the absolute capacity restriction for each route is replaced with two corresponding time-
flow relationships:
C b = 15 + 0.005Vb
C t = 10 + 0.02Vt
where Cb and Ct are travel costs via the bypass and the town-centre routes respectively, and Vb and Vt
are their corresponding flows17.
The flows on both routes will satisfy Wardrop’s equilibrium when the corresponding costs are identical.
By equating both equations it is possible to find the direct solution to Wardrop’s equilibrium as a
function of the total flow Vb+Vt=V:
15 + 0.005Vb = 10 + 0.02(V - Vb ) ® Vb = 0.8V - 200
From this you can see that in case V>250 the two routes will be used, for example at V=2000, Vb=1400
vehicles and Vt=600 and the costs by each route are 22 minutes. The total system costs in this situation
(often to the interest of the politician or planner) are now: 1400*22 + 600*22=2000*22=44,000
minutes. You can imagine that for the planner it might be interesting to lower this total system costs,
and therefore the minimum total system costs for the demand of 2000 vehicles can be calculated (MIN
(Vb*Cb+Vt*Ct)18).
This analytical calculation of Wardrop’s equilibrium is only possible if two (or max. three) routes are
considered. Therefore and iterative method is necessary for more complex networks. Two iterative
methods are popular, i.e. the Incremental Method and the Method of Successive Averages (MSA). Only,
the latter one is discussed in these notes.
If we apply this method to the town-centre example from (Ortúzar & Willumsen, 1994) we get the
following table19
17
Draw both time-flow diagrams in one figure!
18
Prove that the minimum total system costs are at 43,750 minutes (with Vb=1500 and Vt=500).
19
Recalculate this yourself.
The Method of Successive Averages can still be used for the SUE, but now every iteration the travel
times are randomly determined as follows:
C a = c a + z jc a
, where the Ca is the perceived resistance by the traveller/user, ca the deterministic resistance (the
physical resistance due to for example length of link a), j a factor indicating the variation (standard
deviation) of the uncertainty interval, z a randomised number (N(0,1)-statistical distribution).
5
2
4 7
1
3 6
In figure 8.1 the (hypothetical) study area for this case is given. It is divided into 7 zones, each with a
centroid. The network between the centroids is given, including 1 centroid connector in zone 1.
The trip production and attraction, as well as the socio-economic and land-use data are known for the
base-year. These will be used in the first stage to forecast the future production and attraction.
To calculate trip production for future years a single linear regression is performed, which will give an
equation of type:
y = a*x+b
For zone i this will lead to:
Pi = a * Popi + b
The alpha and beta can be derived from the linear regression and the formula will be:
Pi = 3.56 * Popi - 554.05
The trip attraction formula will be derived using multiple linear regression, with equation type:
y = a * x1 + b * x 2 + c
The regression analysis can be performed with SPSS. This is explained in box 8.1.
To get started the data given in table 8.1 is entered, which is shown in the next figure:
An output file is generated, containing the following tables from which the R² and the coeffi-
cients can be read.
Model Summary
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model
1 (Constant) B
-554,054 Std. Error
515,049 Beta t
-1,076 Sig.
,331
POP 3,555 ,135 ,996 26,413 ,000
a.
Dependent Variable: PROD
The same procedure can be followed for the attraction:
Model Summary
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model
1 (Constant) B
518,528 Std. Error
312,317 Beta t
1,660 Sig.
,172
FSP 64,678 2,570 ,692 25,163 ,000
EMPL 3,286 ,229 ,395 14,359 ,000
a.
Dependent Variable: ATTR
With these formulas the production and attraction in the future year can be calculated, using socio-
economic and land-use forecast for the future year. These forecasts are given in the following table in
which also the new production and attraction are calculated.
As can be seen the total production is not equal to the total attraction, so balancing has to be
performed. The attraction has to be multiplied with a factor f as calculated below:
åP i
120721.7
f = i
= = 0.8976
åA
j
j 134490.2
In the following table the production and attraction are balanced and now the next stage can be
performed.
3 5 6
5 2.5
2
3.5 8
5 6 7
7.5 4 4
1
8
3 6
Distribution function
1.2
1
0.8
f(
Cij 0.6
) 0.4
0.2
0
1 2.5 4 5.5 7 8.5 10 11.5 13 14.5 16 17.5 19
Cij
The deterrence between the zones cij can be derived by building a shortest-path-tree. The following
table shows the resulting matrix for cij.
1 2 3 4 5 6 7
1 - 5.0 7.5 7.5 10.0 12.5 16.0
2 5.0 - 3.5 2.5 5.0 7.5 11.0
3 7.5 3.5 - 4.0 7.0 8.0 13.0
4 7.5 2.5 4.0 - 3.0 5.0 9.0
5 10.0 5.0 7.0 3.0 - 9.0 6.0
6 12.5 7.5 8.0 5.0 4.0 - 6.0
7 16.0 11.0 13.0 9.0 6.0 6.0 -
Then the startmatrix is determined using the above distribution function, each cell represents the value
of f(cij) for the specific OD-pair.
1 2 3 4 5 6 7
1 - 0.04 0.17778 0.017778 0.01 0.0064 0.003906
2 0.04 - 0.081633 0.16 0.04 0.017778 0.008264
3 0.017778 0.081633 - 0.0625 0.020408 0.015625 0.005917
4 0.017778 0.16 0.0625 - 0.111111 0.04 0.012346
5 0.01 0.04 0.020408 0.111111 - 0.012346 0.027778
6 0.0064 0.017778 0.015625 0.04 0.0625 - 0.027778
7 0.003906 0.008264 0.005917 0.012346 0.027778 0.027778 -
Initialisation:
Step 1: L(1) = 0, L(2,3,4,5,6,7) = ¥
Step 2: T = {2,3,4,5,6,7}
Step 3: L(2) = 5, L(3) = 7,5 MIN: L(2)
Step 4: Calculate paths via links from node 2:
Path L(2) L(3) L(4) L(5) L(6) L(7)
Old 5 7,5 ¥ ¥ ¥ ¥
Calculated - 8,5 7,5 10 - -
New 5 7,5 7,5 10 ¥ ¥
The shortest paths from node 1 to node 2,3,4,5,6,7 can be read from the last table.
In the following tables the gravity model will be estimated using Furness iterations.
Destination
Origin 1 2 3 4 5 6 7 total target a
1 0,000 0,040 0,018 0,018 0,010 0,006 0,004 0,096 24365,95 254177,4
2 0,040 0,000 0,082 0,160 0,040 0,018 0,008 0,348 9413,95 27076,87
3 0,018 0,082 0,000 0,063 0,020 0,016 0,006 0,204 16889,95 82850,32
4 0,018 0,160 0,063 0,000 0,111 0,040 0,012 0,404 14397,95 35661,88
5 0,010 0,040 0,020 0,111 0,000 0,012 0,028 0,222 29349,95 132419,9
6 0,006 0,018 0,016 0,040 0,063 0,000 0,028 0,170 16889,95 99305,33
7 0,004 0,008 0,006 0,012 0,028 0,028 0,000 0,086 9413,95 109478,5
total 0,096 0,348 0,204 0,404 0,272 0,120 0,086 1,529
target 7545,799 33703,09 9292,136 30907,12 13368,35 14285,09 11620,06 120721,7
b 78715,22 96938,5 45580,74 76552,99 49185,05 119114,9 135134,3
Table 8.6 Gravity model: initialisation
Destination
Origin 1 2 3 4 5 6 7 total target a
1 0,00 10167,09 4518,77 4518,77 2541,77 1626,74 992,82 24365,95 24365,95 1
2 1083,07 0,00 2210,37 4332,30 1083,07 481,37 223,76 9413,95 9413,95 1
3 1472,91 6763,32 0,00 5178,15 1690,81 1294,54 490,23 16889,95 16889,95 1
4 634,00 5705,90 2228,87 0,00 3962,43 1426,48 440,28 14397,95 14397,95 1
5 1324,20 5296,80 2702,43 14713,31 0,00 1634,86 3678,36 29349,95 29349,95 1
6 635,55 1765,45 1551,65 3972,21 6206,58 0,00 2758,50 16889,95 16889,95 1
7 427,62 904,73 647,78 1351,62 3041,09 3041,09 0,00 9413,95 9413,95 1
total 5577,36 30603,29 13859,85 34066,36 18525,76 9505,07 8583,95 120721,7
target 7545,799 33703,09 9292,136 30907,12 13368,35 14285,09 11620,06 120721,7
b 1,352933 1,10129 0,670435 0,907262 0,721609 1,502892 1,353696
Table 8.7 Gravity model: step 1
Destination
Origin 1 2 3 4 5 6 7 total target a
1 0,00 11196,92 3029,54 4099,71 1834,17 2444,81 1343,97 23949,11 24365,95 1,017405
2 1465,33 0,00 1481,91 3930,53 781,56 723,45 302,91 8685,68 9413,95 1,083847
3 1992,75 7448,38 0,00 4697,94 1220,10 1945,55 663,62 17968,33 16889,95 0,939984
4 857,76 6283,85 1494,31 0,00 2859,32 2143,84 596,01 14235,08 14397,95 1,011441
5 1791,55 5833,31 1811,80 13348,83 0,00 2457,01 4979,38 30221,89 29349,95 0,971149
6 859,86 1944,27 1040,28 3603,84 4478,72 0,00 3734,18 15661,15 16889,95 1,078462
7 578,55 996,37 434,30 1226,28 2194,48 4570,44 0,00 10000,41 9413,95 0,941357
total 7545,80 33703,09 9292,14 30907,12 13368,35 14285,09 11620,06 120721,6
target 7545,799 33703,09 9292,136 30907,12 13368,35 14285,09 11620,06 120721,7
b 1 1 1 1 1 1 1
Table 8.8 Gravity model: step 2
As we can see in table 8.9, the accuracy after three steps is higher than 97%.
1 2 3 4 5 6 7
1 0 11392 3082 4171 1866 2487 1367
2 1588 0 1606 4260 847 784 328
3 1873 7001 0 4416 1147 1829 624
4 868 6356 1511 0 2892 2168 603
5 1740 5665 1760 12964 0 2386 4836
6 927 2097 1122 3887 4830 0 4027
7 545 938 409 1154 2066 4302 0
Table 8.10 OD-matrix
The travel times and cost per OD-pair are given in the following table:
j
,with Pi the chance to choose mode i.
In the following tables the chance to choose car are calculated per OD-pair for both market segments
using the utility functions already mentioned. We are only interested in car since we are only
considering the car network in the study area.
Pij(car) segment 1 1 2 3 4 5 6 7
1 0.9241 0.94075 0.95073 0.9346 0.94685 0.94213 0.9475
2 0.9554 0.92414 0.95271 0.9396 0.94979 0.94506 0.9447
3 0.9507 0.9372 0.92414 0.9354 0.94588 0.94449 0.94125
4 0.9507 0.93963 0.95129 0.9241 0.95343 0.94979 0.94868
5 0.9468 0.93339 0.94588 0.9381 0.92414 0.95175 0.95382
6 0.9421 0.92724 0.94449 0.9334 0.95175 0.92414 0.95426
7 0.9475 0.92811 0.94125 0.9319 0.95382 0.95426 0.92414
Pij(car)
segment 2 1 2 3 4 5 6 7
1 0.97069 0.973403 0.977302 0.97069 0.975873 0.973979 0.974852
2 0.97942 0.970688 0.97886 0.97372 0.977577 0.975636 0.974913
3 0.9773 0.972682 0.970688 0.97174 0.975932 0.975517 0.973661
When we assume that the average auto occupancy is 1.4 passengers, the total amount of trips by car
can be determined with the following formula:
Tij (car) 1 2 3 4 5 6 7
1 0 7822 2130 2852 1286 1709 942
2 1095 0 1105 2903 582 536 224
3 1293 4790 0 3017 789 1257 428
4 595 4319 1037 0 1988 1485 413
5 1202 3882 1215 8910 0 1653 3354
6 634 1418 769 2641 3327 0 2778
7 376 640 281 789 1432 2983 0
Black, 1981: Urban transport planning – theory and practice, Croom Helm, London, England
Bovy & van der Zijp, 1999: Transportation modelling, Lecture note CTvk4800, Delft University of
Technology, the Netherlands.
Hensher &Button [Eds.], 2000: Handbook of transport modelling Elsevier Science Ltd., UK
Immers & Stada, 1999: Verkeersmodellen, Lecture note H111, Catholic University of Leuven, the
Netherlands (in Dutch).
Manheim, 1979: Fundamentals of transportation systems analysis, The MIT press, USA
Meyer & Miller, 2001: Urban transportation planning: a decision-oriented approach, Mc Graw Hill
series in transportation, USA
Ortúzar [Eds.], 1992: Simplified transport demand modelling, PTRC Education and Research services
Ltd., England
Ortúzar & Willumsen, 1994: Modelling Transport, 2nd. Ed. Wiley & Sons, England.
Pas, 1986: The urban transportation planning process. In Hanson S. (eds.) The geography of urban
transportation Guilford Press, NY, 49-70.
Rodriguez, Comtois & Slack, 2006: The geography of transport systems. Routledge, England.
Slavin, 2003: The Role of GIS in Land Use and Transport Planning. In: Hensher, Button, Haynes &
Stopher. Handbook of Transport Geography and Spatial Systems, (Handbooks in Transport, Volume
5), Elsevier Science, The Netherlands.
Taylor, Young & Bonsall, 1996: Understanding traffic systems: data, analysis and presentation
Ashgate Publishing Ltd., England
Tolley & Rodney, 1995: Transport systems, policy and planning: a geographical approach Longman,
England
Zuidgeest, 2005. Sustainable urban transport development: a dynamic optimisation approach. TRAIL
Research School, The Netherlands.